
?CLARATIQN OF PAUL POLARIS. Ph.D. 
I. Paul Polalds, Ph.D., declare and say as follows: 

1 . I am currently employed by Genemech, Inc. where my job liile is Staff 
Scientist. 

2. Since joining Genemech in 1 999, one of my primary responsibiliiies has 
been leading Geneniech's Tumor Antigen Project, which is a large research 
project with a primary focus on identifying tumor cell markers rhar fmd use 
as targets for both the diagnosis and treatment of cancer in humanj;. 

3. As I stated in ray previous Declaration dated May 7, 2004 (actached as 
Exhibit A), ray laboratory has been employing a variety of techniques, 
including microarray analysis, to identify genes which are differentially 
expressed in himian tumor [issue relative to normal human ii<«ue. The 
primary purpose of this research is to identify proteins that are abundantly 
expressed on certain human tumor lissueCs) and that are either (i) not 
expressed, or (ii) expressed at deiectably lower levels, on normal tissue{s). 

4. In the course of our research using microarray analysis, we have identified 

approximately 200 gene uranscripts that are present in human tumor tissue 

at significantly higher levels than in normal human tissue. To date, we 

have successfully generated antibodies that bind to 31 of the tumor antigen 

proteins expressed from these differentially expressed gene traascnpts and pn 

have used these antibodies to quantitatively determine the level of CO 

production of these tumor antigen proteins in both human tumor tissue and 

normal tissue. We have then quantitatively compared die levels of mRNA ^ 

and protein in both the tumor and normal tissues analyzed. The results of ^ 

these analyses are attached herewith as Exhibit B. In Exhibit B, means ^ 

that the mRNA or protein was deiectably overexpressed in the tumor tissue 05 

relative to normal tissue and means that no detectable overexpression fj^ 

was observed in ilie mmor tissue relative to normal tissue. ^ 

O 

5. As shown in Exhibit B, of the 3 1 genes identified as being deteciably ^ 
overexpressed in human tumor tissue as compared to normal human tissue -< 
at the mRNA level , 28 of them (i.e., greater than 90%) are also deiectably 
overexpressed in human tumor tissue as compared 10 norrnal human tissue 

at the protein level . As such, in the cases where we have been able to 
quantitatively measure both (i) mRNA and (ii) protein levels in both (i) 
luraor tissue and (ii) normal tissue, we have observed that in the vast 
majority of cases, there is a very strong correlation between increases in 
mRNA expression and increases in the level of protein encoded by that 
mRNA. 
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6. 



7. 



Based upon my own experience accumulated in more than 20 years of 
research, including the data discussed in paragraphs 4-5 above and my 
knowledge of the relevant scientific literature, it is my considered scientific 
opinion that for human genes, an increased level of mRNA m a tumor 
tissue relative to a normal tissue more often than not correlates to a similar 
increase in abundance of the encoded protein in the tumor tissue relative to 
The normal tissue. In fact, it remains a generally accepted working 
assumption in molecular biology that increased mRNA levels are more 
often than not predictive of elevated levels of the encoded protein. In fact, 
an entire industry focusing on the research and development of therapeutic 
antibodies lo treat a variety of human diseases, such as cancer, operates on 
this working assumption, 

I hereby declare that all statements made herein of my own knowledge are 
true and that all statements mode on information or belief are believed to be 
true, and further that these statements were made with the knowledge that 
willful false statements and the like so made are punishable by fine or 
imprisoimient, or both, under Section 1001 of Title 18 of the United States 
Code and that such willful statements may jeopardize the validiiy of the 
application or any patent ibsued thereon. 





Paul Polakis, Ph.D. 
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EXHIBIT A 



nON OF PAUL POLAKIS, Ph.D. 
I, Paul Polakis, Ph.D., declare and say as follows: 

1 . I was awarded a PliD. by the Department of Biochemistry of the Michigan 
State University in 1 984. My sci^tific Curriculum Vitae is attached to and forms 
part of this Declaration (Exhibit A). 

2. I am currently employed by Genentech, Inc. where my job title is Staff 
Scientist. Since joining Genentech in 1999, one of ray primary responsibilities has 
been leading Genentech's Tumor Antigen Project, which is a large researdi project 
with a primary focus on identifying tumor cell mailcers that find use as targets for 
both ttie diagnosis and treatment of cancer in humans, 

3. As part of the Tumor Antigen Project, my laboratory has been analyzing 
differential expression of various genes in tumor cells relative to normal cells. 
The purpose of this research is to identify proteins that are abundantly expressed 
on certain tumor cells and that are either (i) not expressed, or (ii) expressed at 
lower levels, on corresponding normal cells. We call such differentially expressed 
proteins "tumor antigen proteins". When such a tumor antigen protein is 
identified, one can produce an antibody that recognizes and binds to that protein. 
Such an antibody finds use in the diagnosis of human cancer and may ultimately 
serve as an effective therapeutic in the treatment of htmrian cancer. 

4. In the course of the research conducted by Genentech's Tumor Antigen 
Project, we have employed a variety of scientific techniques for detecting and 
studying differential gene expression in human tumor cells relative to normal cells, 
at genomic DNA, mRNA and protein levels. An important example of one such 
technique is the well known and widely used technique of microarray analysis 
which has proven to be extremely usefid for the identification of mRNA molecules 
that are differentially expressed in one tissue or cell type relative to another In the 
course of our research using microarray analysis, we have identified 
approximately 200 gkie transcripts that are present in human tumor cells at 
significantly higher levels than in corresponding normal human cells. To date, we 
have generated antibodies that bind to about 30 of the tumor antigen proteins 
expressed fi^om these differentially expressed gene transcripts and have used these 
antibodies to quantitatively determine the level of production of these tumor 
antigen proteins in both human cancer cells and corresponding normal cells. We 
have then compared the levels of mRNA and protein in both the tumor and normal 
cells analyzed. 

5. From the mRNA and protpin expression analyses described in paragr^h 4 
above, we have observed that there is a strong correlation between changes in the 
level of mRNA present in any particular cell type and the level of protein 




expressed from that mRNA in that cell type. In ^proximately 80% of our 
observations we have found that increases in the level of a particular liiRNA 
correlates with changes in the level of protein expressed from that mRNA when 
human tumor cells are compared with their corresponding normal cells. 

6. Based upon my own experience accumulated in more than 20 years of 
research, including the data discussed in paragraphs 4 and 5 above and my 
knowledge of the relevant scientific literature, it is my considered scientific 
opinion that for human genes, an increased level of mRNA in a tumor cell relative 
to a normal cell typically correlates to a similar increase in abundance of the 
^coded protein in the tumor cell relative to the normal cell. In fact, it remains a 
central dogma in molecular biology that increased mRNA levels are predictive of 
coire^nding increased levels of the encoded protein. While th^e have been 
published reports of genes for which such a correlation does not exist, it is my 
opinion that such reports are exceptions to the commonly understood general rule 
that ino^eased mRNA levels are predictive of corresponding increased levels of the 
encoded protein. 

7. I hereby declare that all statements made herein of my own knowledge are 
true and th^ all statements made on in&rmation or belief are believed to be true, 
and further that these statements were made with the knowledge that vwllfijl felse 
statements and the like so made are punishable by fine or imprisonm^t, or both, 
under Section 1001 of Title 18 of the United States Code and that such willful 
statements may jeopardize the validity of the application or any patent issued 
thereon. 



Dated : SUVo^ 




Paul Polakis, Ph.D. 
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Abstract 

The BMI'l gene is a putative oncogene belonging to the Polycomb 
group family that cooperates with c-myc in the generation of mouse 
lymphomas and seems to participate in ceil cycle regulation and senes- 
cence by acting as a transcriptional repressor of the INK4a/ARF locus. 
The BMl'l gene has been located on chromosome 10pl3, a region involved 
in chromosomal translocations in infant leukemias, and amplifled in 
occasional non-Hodgkin's lymphomas G^HLs) and solid tumors. To de- 
termine the possible alterations of this gene in human malignancies, we 
have examined 160 lymphoproliferative disorders, 13 myeloid leukemias, 
and 89 carcinomas by Southern blot analysis and detected BMI-l gene 
ampliflcation (3- to 7-fold) in 4 of 36 (11%) mantle cell lymphomas 
(MCLs) with no alterations in the INK4a/ARF locus. BMI-l and pl6'^^" 
mRNA and protein expression were also studied by real-time quantitative 
reverse transcription-PCR and Western blot, respectively, in a subset of 
NHLs. BMI-l expression was significantly higher in chronic lymphocytic 
leukemia and MCL than in follicular lymphoma and large B cell lym- 
phoma. The four tumors with gene amplification showed significantly 
higher mRNA levels than other MCLs and NHLs with the BMf-I gene in 
germline configuration. Five additional MCLs also showed very high 
mRNA levels without gene ampliflcation. A good correlation between 
BMI-l mRNA levels and protein expression was observed in all types of 
lymphomas. No relationship was detected between BMI-l and pl6"^'^'*" 
mRNA levels. These findings suggest that BMI-I gene alterations in 
human neoplasms are uncommon, but they may contribute to the patho- 
genesis in a subset of malignant lymphomas, particularly of mantle cell 
type. 

Introduction 

The BM'P gene is a putative oncogene of the Polycomb group 
originally identified by retroviral insertiona! mutagenesis in E/jt-c- 
myc transgenic mice infected with the Moloney murine leukemia 
virus (1, 2). These animals had a rapid development of pre-B cell 
lymphomas showing frequent pro viral insertions near the BMI-l gene. 
This integration resulted in BMI-l overexpression suggesting a coop- 
erative effect between C-MYC and BMI-l genes in the development of 
these tumors (3, 4). Recent studies have indicated that the BM-l gene 
may also participate in cell cycle control and senescence through the 
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INK4a/ARF locus by acting as an upstream negative regulator of 
pj^iNK4a pi4/pl9'^*^'' gene expression (5). The human BMI-l 
gene has been mapped to chromosome 10pl3 (6), a region involved in 
chromosomal translocations in infant leukemias (7) and rearrange- 
ments in malignant T cell lymphomas (8, 9). More recently, high-level 
DNA amplifications of this region have been found by comparative 
genomic hybridization in NHLs and solid tumors (10, 11). However, 
the possible implication of the BMI-l gene in these alterations and its 
role in the pathogenesis of human tumors is not known. The aim of 
this study was to analyze the possible BMI-l gene alterations and 
expression in a large series of human neoplasms and to determine the 
relationship with INK4a/ARF locus aberrations. 

Materials and Methods 

Case Selection. A series of 262 human tumors, including 173 hematolog- 
ical malignancies and 89 carcinomas (Table 1), matched nomial tissues from 
all carcinomas, 1 1 samples of normal peripheral mononuclear cells, and 5 
reactive lymph nodes and tonsils, w^ere selected based on the availability of 
frozen samples for molecular analysis. 

DNA Extraction and Southern Blot Analysis. Genomic DNA was ob- 
tained using Proteinase K/RNase treatment. 1 5 >g were digested with EcoKl 
and //mdlll restriction enzymes (Life Technologies, Inc., Gaithersburg, MD), 
for Southern blot analysis and hybridized with a 1 .5-kb Pstl fragment of the 
partial BMl-l cDNA (6). 

RNA Extraction and Real-time Quantitative RT-PCR. Total RNAVas 
obtained from 67 lymphoid neoplasms (10 CLLs, 27 MCLs, 8 FLs, and 22 
LCLs) using guanidine/isothiocyanate extraction and cesium/chloride gradient 
centrifiigation. One /ig of total RNA was transcribed into cDNA using 
MMLV-reverse transcriptase (Life Technologies, Inc.) and random hexamers, 
following manufacturer's directions. Sequences of the BM-l and the pi 6 
detection probes and primers were designed using the Primer Express program 
(Applied Biosystems, Foster City) as follows: BM-l sense, S'-CTGGTTGC- 
CCATTGACAGC-3'; BM-l anUsense, 5'-CAGAAAATGAATGCGAG- 
CCA'V; pl6 sense, 5'-CAACGCACCGAATAGTTACGG-3'; pi 6 antisense, 
5'-AACTTCGTCCTCCAGAGTCGC-3'. The probes BM-l, 5'-CAGCTC- 
GCTrCAAGATGGCCGC-3', and pl6, 5'-CGGAGGCCGATCCAGGTGG- 
GTA-3', were labeled with 6-carboxy-fluorescein as the reporter dye. The 
TaqMan-GAPDH Control Reagents (Applied Biosystems) were used to am- 
plify and detect the GAPDH gene, as recommended by the manufacturer. The 
quantitative assay amplified 1 fiml of cDNA in two to four replicates using the 
primers and probes described above and the standard master mix (Applied 
Bioisystems). AH reactions were performed in an ABI PRISM 7700 Sequence 
Detector System (Applied Biosystems). GAPDH, BMI-l, and pie"^*^'** ex- 
pression was related to a standard curve derived from serial dilutions of Raji 
cDNA. The RUs of BMI-l and pi 6"^*^'*'* expression were defined as the 
mRNA levels of these genes normalized to the GADPH expression level in 
each case. >, 

Protein Analysis. Whole-cell protein extracts were obtained from addi- 
tional frozen tissue available in 31 cases (7 CLLs, 12 MCLs, 8 FLs, and 4 
LCLs), loaded onto a 10% SDS-polyacrylamide gel, and electroblotted to a 
nitrocellulose membrane (Amersham). Blocked membranes were incubated 
sequentially with the monoclonal antibody BMI-F6 (12), antimouse conju- 
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*rable 1 Hematological malignancies and solid tu*. 


,amples analyzed for BMI-l 


gcfi£ alterations 




Tissue samples 


No. of cases 


Hematological malignancies 




Hodgkin's disease 


2 


B cell lymphoproliferative disorders 




B-Acute lymphoblastic leukemia 


14 


CLL 


29 


Hairy cell leukemia 


4 


FL 


15 


MCL 


36 


LCL 


40 


T cell lymphoproliferative disorders 




T' Acute lymphoblastic leukemia 


8 


Large granular cell leukemia 


4 


Peripheral T-cell lymphoma 


S 


Myeloproliferative disorders 




Acute myeloid leukemia 


7 


Chronic myeloid leukemia 


6 


Solid tumors 




Colon carcinoma 


26 


Breast carcinoma 


29 


Laryngeal squamous cell carcinoma 


. 34 


Total 


262 



gated to horseradish peroxidase (Amersham). and detected by enhanced chemi- 
tuminescence (Amersham) according to the manufecturer's recommendations. 

Statistical Analysis. Because of the non-normal distribution of the samples 
and the small size of some subsets of timiors, the statistical evaluation was 
performed using nonparametric tests (SPSS, version 9.0). Comparison between 
mRNA expression levels in the different groups of NHLs was perfonned using 
the Kruskal-Wallis Test, with a P for significance set at 0.05. For differences 
between particular groups, the conservative Bonferroni procedure was per- 
fomied, and the P was set at 0.005. The remaining statistical analyses were 
carried out using the Mann- Whitney nonparametric U test (significance, P 
<0.05). The comparison between BMI-1 and pl6''^'^'" quantitative mRNA 
levels was also performed using the Pearson's correlation coefficient. 



Results 

BMI-1 Gene Amplification. The BMI-l gene was examined by 
Southern blot in a large series of human tumors and normal samples 
(Table 1). The cDNA probe used in the study detected three EcoU 
fragments of 7.3, 3.8, and 2.6 kb and three ///wdlTI fragments of 6.2, 
4, and 3,5 kb. BMI-l gene amplification (3- to 7-fold) was detected in 
4 of 36 (1 1%) MCLs (Fig. 1). The amplifications were confirmed with 
both restriction enzymes. The amplified MCLs were two blastoid and 
two typical variants. No amplifications were observed in any of the 
solid tumors when compared with their respective matched non- 
neoplastic mucosa. No BM-l gene rearrangements were observed in 
any of the samples examined, 

BMI-1 niRNA Expression. To determine the BMI-l expression 
pattern in NHL we analyzed BMI-1 mRNA levels by real-time quan- 
titative RT-PCR in 67 lymphomas (10 CLLs. 27 MCLs, 8 FLs, and 22 
LCLs), including the four tumors with gene amplification, A distinct 
BMI-i mRNA expression pattern was observed in the different types 
of lymphomas (Fig. 2; Kruskal-Wallis Test; P < 0.00 1). The BMI 
mRNA levels in CLLs (mean, 2.2 RU; SD, 1.3) and MCLs with no 
BMI-1 gene amplification (mean, 2.5 RU; SD, 2.3) were significantly 
higher than in FLs (mean, 0.9 RU; SD, 0,8) and LCLs (mean, 0.6 RU; 
SD, 0.4; Mann- Whitney nonparametric U test; P < 0.0 1). The 4 
MCLs with BMI'l jgene amplification showed significantly higher 
levels of expression than all other groups of tumors (mean, 5. 1 RU; 
SD, 1.6; P < 0.005). In addition, five typical MCLs with no structural 
alterations of the gene also showed very high levels of BMI-1 mRNA 
expression ranging fi-om 4 to 9.8 RU, similar to cases with gene 
amplification (Fig. lA), 

BMI-1 Protein Expression. BMI- 1 protein expression was exam- 
ined by Western blot in 31 tumors (7 CLLs; 12 MCLs, including two 

2410 



cases with BMI-1 gene^ pi ifi cation and 4 cases with mRNA over- 
expression and no structural alteration of the gene; 8 FLs, and 4 LCLs) 
in which additional frozen tissue was available. The monoclonal 
antibody against BMI-l detected three closely migrating proteins of 
M^ 45,000-48,000 (2), The two more slowly migrating bands prob- 
ably represent phosphorylated isoforms of the protein (12). The two 
MCLs with gene amplification and three of four cases with mRNA 
overexpression without amplification of the gene showed very high 
levels of protein expression. The remaining MCLs and CLLs showed 
intermediate levels of protein expression, whereas low- or no-expres- 
sion signals were detected in the LCLs and FLs included in the study 
(Fig. 3). These results indicate that BMI-1 protein expression in NHL 
is concordant with the mRNA levels observed by real-time quantita- 
tive RT-PCR. 

Relationship between BMI-1 and pl6'^'^'** Gene Alterations. 
The lNK4a/ARF locus has been recently identified as a downstream 
target of the transcriptional repressing activity of the BMI-1 gene, 
suggesting that this gene may contribute to human neoplasias with 
wild type INK4/ARF (5). Most of the lymphoproliferative disorders 
analyzed in the present study, including the four cases with BMI-1 
gene amplification, had been previously examined for p53 gene mu- 
tations and INK4a/ARF locus alterations, including gene deletions, 
mutations, hypermethylation, and expression (13, 14). The four MCLs 
with BMI-1 gene amplification and mRNA overexpression and the 
five tumors with BMI-1 mRNA overexpression with no structural 
alterations of the gene showed a wild- type configuration of the 
INK4a/ARF locus (13). However, one case with BMl-1 gene ampli- 
fication and one case with mRNA overexpression with no alteration of 
the gene showed p53 gene mutations associated with allelic deletions. 

To determine the possible relationship between BMI-1 and 
plgrNK4a jnRNA expression, pi 6*^*^'^" mRNA levels were evaluated 
by real-time quantitative RT-PCR in 50 tumors (10 CLLs, 27 MCLs, 
and 13 LCLs), including 6 cases with alterations in the lNK4a/ARF 
locus (2 MCLs and 1 LCL with gene deletion, 2 LCLs with 

pi6 promoter hypermethylation, and I CLL with pie^^^'^^ gene 
mutation), and the 4 lymphomas with BMI-l amplification. Negative 
or negligible levels of pi 6'^^"*" were observed in the 6 tumors with 
INK4a/ARF locus alterations. These cases were not included in the 
comparisons between BMI-1 and pl6'^'^'*^ mRNA expression. The 
pjgiNFC4a expression levels were relatively similar in the different 
types of tumors. Only LCLs tended to have lower levels of expression, 
but the differences did not reach statistical significance (Fig. IB). No 
differences were observed in the pl6"^*^'*^ mRNA levels between 
tumors with BM-l gene amplification and overexpression and lym- 
phomas with germline configuration of the gene. 
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Fig. 1. Southern blot analysis of BMI- 1 gene. Four MCLs (MCL*) showed BMJ-l gene 
amplification (3- to 7-fold) compared with non-neoplastic tissues (AO and other NHLs. No 
amplifications or gene rearrangements were detected in the remaining NHLs and carci- 
nomas included in the study. 
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Fig. 2. quantitative BMI-1 mRNA transcript analysts (median and range) using 
real-time RT-PCR in a series of NHLs. MCLs with Bhfl-l gene amplification {MCL*) 
revealed significantly higher overall BMM mRNA levels than all other types of NHLs, 
including MCLs with no structural alterations of the gene (P < 0.005). MCLs and CLLs 
expressed significantly higher levels than FLs and LCLs {P < 0.001). Results are depicted 
as the ratio of absolute BMI-1 :GADPH mRNA transcript numbers (RU). Bars, SD. 
quantitative pi 6'***^^* mRNA transcript analysis (median and range) using real-time 
RT-PCR in a scries of NHLs. Expression leveb were relatively similar in the different 
types of tumors. Results are depicted as the ratio of absolute p 1 6'^*^^^G ADPH mRNA 
transcript numbers (RU). Bars, SD. 

Discussion 

In the present study, we have examined a large series of human 
tumors for the presence of gene alterations and mRNA expression of 
the BM'I gene. Gene amplification was identified in four MCLs. 
These tumors showed significantly higher levels of mRNA and pro- 
tein expression compared with other lymphomas with BMI-l in germ- 
line configuration. BMI-1 expression levels were also highly up- 
regulated in a subset of MCLs with no apparent structural alterations 
of the gene. No alterations were detected in any of the different types 
of carcinomas included in the study. BMI-J is considered an oncogene 
belonging to the Polycomb group family of genes. These proteins 
mainly act as transcriptional regulators, controlling specific target 
genes involved in development, cell differentiation, proliferation, and 
senescence. Different studies have shown the implication of BMI-1 
overexpression in the development of lymphomas in murine and 
feline aninial models (3, 4). The findings of the present study indicate 



for the first time that ■: gene alterations in human neoplasms are 
an uncommon phenomenon, but they seem to occur mainly in a subset 
of NHLs, particularly of mantle cell type. 

The human BMI-1 gene has been mapped to chromosome !0pl3. 
High-level DNA amplifications and gains in this region have been 
identified by comparative genomic hybridization in occasional solid 
tumors and NHLs (10, 11). Different chromosomal translocations 
involving the 10pl3 region have also been identified in infant leuke- 
mias and T cell lymphoproliferative disorders (7, 8, 15). Most acute 
leukemias with this chromosomal alteration occur in children <12 
months of age, whereas it seems to be extremely rare in adults. I Op 
translocations in T-cell lymphoproliferative disorders have been ob- 
served mainly in adult T cell leukemia/lymphomas and occasional 
cutaneous T cell lymphomas. In our study, we did not observe BM~I 
rearrangements or amplifications in any of the acute leukemias or T 
cell lymphomas. However, all of the acute leukemias in this study 
were diagnosed in patients over 16 years, and no adult T cell leuke- 
mia/lymphomas or cutaneous lymphomas could be included in the 
series. Similarly, high-level DNA amplifications at the I Op 1 3 region 
have been detected in head and neck carcinomas and other solid 
tumors. Although we found no evidence for BM-J gene rearrange- 
ments or amplifications in a substantial set of carcinomas, this does 
not exclude the possibility of increased gene expression or protein 
levels in these tumors. Additional studies are required to elucidate the 
possible involvement of BM-l in these particular groups of human 
neoplasms. 

In human hematopoietic cells, BMM is preferentially expressed in 
primitive CD34+ bone marrow cells, whereas it is negative or very 
low in more mature CD34- cells (16). In peripheral lymphocytes, and 
particularly in follicular B cells, BMI-1 protein expression has been 
detected in resting cells of the mantle zone, whereas it is down- 
regulated in proliferating germinal center cells (17,1 8). These obser- 
vations indicate that BMI-1 expression in normal hematopoietic cells 
is tightly regulated in relation with cell differentiation in bone marrow 
and antigen-specific response in peripheral lymphocytes. BMI-1 ex- 
pression in human tumors has not been examined previously. In this 
study, we have demonstrated that BMI-1 mRNA and protein expres- 
sion show a distinct pattern in different types of lymphomas. Thus, 
BMI-1 levels were low in LCLs and FLs and significantly higher in 
MCLs and CLLs. These findings suggest that BMI-1 expression 
patterns in B cell lymphomas maintain in part the expression profile 
of their normal cell counterparts; because FLs and at least a subgroup 
of LCLs are considered lymphomas derived from follicular germinal 
center cells, whereas MCLs and CLLs are tumors mainly derived from 
naive pregerminal center cells. However, the four MCLs with BMI-I 
gene amplification expressed significantly higher mRNA levels than 
all other tumors. In addition, five MCLs with no structural alterations 
of the gene showed high mRNA levels similar to those observed in 
tumors with BM-l gene amplification, suggesting that other mecha- 
nisms may be involved in up-regulation of the gene in these lympho- 
mas. Different studies using animal models have shown a dose- 
dependent effect of BM-l gene expression on skeleton development 
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Fig. 3. Western blot analysis of BMM protein in NHLs. The amplified MCL (17624) 
showed the highest BMI- 1 protein levels, whereas other MCLs and CLLs had intermediate 
levels of expression. Very low or negative signal was observed in FLs and LCLs. 
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arfd lymphomagenesis (1,3). These obser Ins suggest that the high 
mRNA and protein levels detected in a subset of MCLs may play a 
role in the pathogenesis of these neoplasms. 

Recent studies have identified the INK4/ARF locus as a down- 
stream target of the BMI-1 transcriptional repressor activity, suggest- 
ing that BMI-i overexpression may contribute to human neoplasias 
that retain the wild-type INK4a/ARF locus (5), Interestingly, in our 
study, BMI-1 amplification and overexpression appeared in tumors 
with no alterations in pI6^^'^'' and pi genes. However, we could 
not detect differences in the expression levels of pi 6"^^"^^ in tumors 
with and without BM-I gene alterations. The reasons for this apparent 
discrepancy with experimental observations are not clear. One possi- 
bility may be that genes other than INK4a/ARF are the main targets of 
BMI-l repressor activity in these tumors. Particularly, different genes 
of the HOX family are regulated by BMI-l and may also be involved 
in lymphomagenesis (19, 20). 

In conclusion, the findings of this study indicate that BMI-1 gene 
expression is differentially regulated in B cell lymphomas. Alterations 
of the gene seem to be an uncommon phenomenon in human neo- 
plasms, but they may contribute to the pathogenesis in a subset of 
MCLs. Although, BM-l gene alterations occurred in tumors with 
wild-type INK4a/ARF locus, the possible cooperation between these 
genes and the oncogenic mechanisms of BMI- 1 in human neoplasms 
require additional analysis. 
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Id proteins antagonize basic helix-Ioop-helix pro- 
teins, inliibit differentiation, and enhance cell prolif- 
eration. In this study we compared the expression of 
Id-1, Id-2, and Id-3 in the normal pancreas, in pan- 
creatic cancer, and in chronic pancreatitis (CP). 
Northern blot analysis demonstrated that all three Id 
mRNA species were expressed at high levels in pan- 
creatic cancer samples by comparison with normal or 
CP samples. Pancreatic cancer cell lines frequentiy 
coexpressed all three Ids, exhibiting a good correla- 
tion between Id mRNA and protein levels, as deter- 
mined by immunoblotting with highly specific anti-Id 
antibodies, hnmunohistochemistry using these anti- 
bodies demonstrated the presence of faint Id-1 and 
Id-2 immunostaining in pancreatic ductal ceUs in the 
normal pancreas, whereas Id-3 immunoreactivity 
ranged from weak to strong. In the cancer tissues, 
many of the cancer ceUs exhibited abundant Id-1, 
Id-2, and Id-3 immunoreactivity. Scoring on the basis 
of percentage of positive cells and intensity of immu- 
nostaining indicated that Id-1 and Id-2 were increased 
significantly in the cancer cells by comparison with 
the respective controls. Mild to moderate Id immuno- 
reactivity was also seen in the ductal cells in the 
CP-like areas adjacent to these cells and in the ductal 
cells of small and interlobiUar ducts in CP. In con- 
trast, in dysplastic and atypical pajpillary ducts in CP, 
Id-1 and Id-2 immunoreactivity wa^ as significantiy 
elevated as in the cancer cells. These findings suggest 
that increased Id expression may be associated with 
enhanced proliferative potential of pancreatic cancer 
cells and of proliferating or dysplastic ductal cells in 
CP. (Am J Pathol 1999, 155:815-822) 



Basic helix-loop-helix (bHLH) proteins play an important 
role as transcription factors in cellular development, pro- 
liferation, and differentiation.^*^ The basic domain of the 
bHLHs is required for binding to an E-box DNA se- 
quence, thus promoting transcription of specific target 
genes. The HLH domain promotes dimer formation with 
various members of the bHLH protein family. ^-^ Ho- 
modimers of the class B family of bHLH proteins, includ- 
ing MyoD, NeuroD, and numerous other proteins, are 
known to activate tissue-specific genes.^^ These tissue- 
specific bHLHs typically form heterodimers with widely 
expressed class A bHLHs, which include proteins en- 
coded by E2A, E2-2, HEB. and other genes (also termed 
E-proteins).^"^ These heterodimers activate transcription 
of genes that are associated with differentiation. 

Id genes encode a family of four HLH proteins that lack 
the basic DNA binding domain. They act as dominant- 
negative HLH proteins by forming high affinity het- 
erodimers with other bHLH proteins, thereby preventing 
them from binding to DNA and inhibiting transcription of 
differentiation-associated genes.''^"^^ Id gene expres- 
sion is down-regulated on differentiation in many cell 
types in vitro and in wVo.^^^® In addition. Id proteins seem 
to be required for cell cycle progression through G^/S 
phase in certain cell types, and interaction between Id-2 
and pRB is associated with enhanced proliferation in 
some cell lines in vitro,^^~^^ 

Pancreatic cancer is the fifth leading cause of cancer 
death in the United States, with a mortality rate that vir- 
tually equals its incidence rate.^"* This malignancy is often 
associated with the overexpression of a variety of mito- 
genic growth factors and their receptors, and by onco- 
genic mutations of K-ras and inactivation of the p53 tumor 
suppressor gene.^^ We have recently reported that pan- 
creatic cancers overexpress the HLH protein Id-2, and 
that enhanced expression of this protein is evident in the 
cytoplasm of the cancer cells within the pancreatic tumor 
mass.^^ It is not known, however, whether the expression 
of other Id proteins is altered in this malignancy, or 
whether their expression is altered in chronic pancreatitis 
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(CP), an inflammatory disease that Is characterized by 
dysplastic ducts, foci of proliferating ductal cells, acinar 
cell degeneration, and fibrosis.^^ We now report that 
there is a five- to sixfold increase in ld-1 and ld-2 mRNA 
levels and a twofold increase in ld-3 mRNA levels in 
pancreatic cancer by comparison with the normal pan- 
creas. In contrast, overall Id mRNA levels are not in- 
creased in CP. 



Patients and Methods 

Normal human pancreatic tissue samples from 7 male 
and 5 female donors (median age 41.8 years, range 
14-68 years), CP tissues from 13 males and 1 female 
(median age 42.1 years; range 30-56 years), and pan- 
creatic cancer tissues from 10 male and 6 female donors 
(median age 62.6 years; range 53-83 years) were ob- 
tained through an organ donor program and from surgi- 
cal specimens from patients with severe symptomatic 
chronic pancreatitis or pancreatic cancer. A partial 
duodenopancreatectomy (Whipple/pylorus-preserving 
Whipple; n = 13), a left resection of the pancreas (n = 2), 
or a total pancreatectomy (n = 1 ) were carried out in the 
pancreatic cancer patients. According to the TNM clas- 
sification of the Union Internationale Centre le Cancer 
(UlCC) 6 tumors were stage 1, 1 was stage 2, and 9 were 
stage 3 ductal cell adenocarcinoma. Freshly removed 
tissue samples were fixed in 10% formaldehyde solution 
for 12 to 24 hours and paraffin-embedded for histological 
analysis. In addition, tissue samples were frozen in liquid 
nitrogen immediately on surgical removal and maintained in 
-80''C until use for RNA extraction. All studies were ap- 
proved by the Ethics Committee of the University of Bern, 
Bern, Switzerland, and by the Human Subjects Committee 
at the University of California, Irvine, California. 

Northern Blot Analysis 

Northern blot analysis was carried out as described pre- 
viously. ^^-^^ Briefly, total RNA was extracted by the single 
step acid guanidinium thiocyanate phenol chloroform 
method. RNA was size-fractionated on 1.2% agarose/1 .8 
mol/L formaldehyde gels, electrotransf erred onto nylon 
membranes, and cross-linked by UV irradiation. Blots 
were prehybridized and hybridized with cDNA probes 
and washed under high stringency conditions. The fol- 
lowing cDNA probes were used: a 979-bp human ld-1 
cDNA probe, a 440-bp human ld-2 cDNA probe, and a 
450-bp human ld-3 cDNA probe, covering the entire 
coding regions of ld-1, ld-2, and ld-3, respectively. A 
BamHI 190-bp fragment of mouse 78 cDNA that hybrid- 
izes with human cytoplasmic RNA was used to confirm 
equal RNA loading and transfer. Blots were then exposed 
at -80°C to Kodak BioMax-f\4S films and the resulting 
autoradiographs were scanned to quantify the intensity of 
the radiographic bands.^^'^® For each sample the ratio of 
Id mRNA expression to 7S expression was calculated. To 
compare the relative increase in expression of the re- 
spective Id mRNA species in the cancer and CP sam- 
ples, the same normal samples were used for normal/ 




Figure 1. mRNA expression of Id-1, Id-2. and ld-3 in pancreatic cancer and 
chronic pancreatitis. Total RNA (20 /ig/lane) from six normal, eight cancer- 
ous, and seven chronic pancreatitis tissue samples were subjected to North- 
em blot analysis using ^^P-labeled cDNA probes (500,000 cpm/ml) specific 
for Id-1, Id-2, and Id-3, respectively. A 7S cDNA probe (50,000 cpm/ml) was 
used as a loading and transfer control. Exposure times of the normal/cancer 
blots were I day for all Id probes, and 2 days for the normal/CP blots. 
Exposure time was 4 hours for mouse 7S cDNA. By comparison with the 
normal samples. Id- 1 and Id-3 mRNA levels were elevated in 8 and 9 cancer 
samples, respeaively, whereas Id-2 was elevated in 6 cancer samples. 

cancer and normal/CP membranes. The median score for 
ld-1 . ld-2. and ld-3 mRNA levels in these normal samples 
was set to 100. Statistical analysis was performed with 
SigmaStat software (Jandel Scientific. San Raphael. CA). 
The rank sum test was used, and P < 0.05 was taken as 
the level of significance. 

Cell Culture and Western Blot Analysis 

PANC-1, MIA-PaCa-2. ASPC-1. and CAPAN-1 human 
pancreatic cell lines were obtained from ATCC (Manas- 
sas. VA). COLO-357 human pancreatic cells were a gift 
from Dr. R. S. Metzger (Durham. NC), Cells were routinely 
grown in DMEM (COLO-357. MlA-PaCa-2. PANC-1) or 
RPMl (ASPC-1. CAPAN-1) supplemented with 10% fetal 
bovine serum. 100 U/ml penicillin, and 100 jLtg/ml strep- 
tomycin. For immunoblot analysis, exponentially growing 
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Figure 2 Densitometric analysis of Northern blots. Autoradiographs of 
Northern blots from 12 normal. 14 CP, and I6 pancreatic cance.^ wer^ 
analyzed^by densitometo^. mRNA levels were determined by calculating the 
mxo of the optical density for the respeaive Id mRNA species in relation to 
the optical density of mouse 7S cDNA. To compare the relative increase in 
expression of the respective Id mRNA species in the cancer and CP samples 
the same normal samples were used for nomial/cancer and normal/CP 
membranes. Normal pancreatic tissues are indicated by circles, CP tissues by 
mangles, and cancer tissues by squares. Data ar^ expressed as median 
scores _ SD^ By comparison widi die normal samples, only the cancer 
f^J^n n?^^ l!^."'^?'^"' increases: 6.5-fold (P< 0.01) for id-1, fivefold 
iP < 0.01) for Id-2, and twofold iP = 0.027) for Id-3. 



cells (60-70% confluent) were solubilized in lysis buffer 
containing 50 mmoi/L Tris-HCI. pH 7,4. 150 mmol/L NaCI. 
1 mmoi/L EDTA, 1 ^g/ml pepstatin A. 1 mmol/L phenyl- 
methylsulfonyl fluoride (PMSF), and 1% Triton X-100. Pro- 
teins were subjected to sodium dodecyl sulfate polyacryl- 



1 2kb-> 




14 kOa- 




mRNA 



Id-I 



Protein 




ld-2 



16 IcDa^ 



Protein 



MkDo-f i^. 




mRNA 



Id-3 



I Protein 



Figure 3, Id mRNA and protein expression in pancreatic cancer cell lines 
Upper panels: Total RNA (20 Mg/lane) from 5 pancreatic cancer ceU lines 
were subjeaed to Nonhem blot analysis using ^^P-labeled cDNA probes 
(500.000 cpm/ml) specific for Id-1. Id-2. and Id-3, respectively. Exposure 
tunes were 1 day for all Id probes. Lower panels: Immunoblotting. Cell 
lysates (30 fig/lzne) were subjected to SDS-PAGE, Membranes were probed 
with specific Id-1, Id-2. and Id-3 antibodies. Visualization was performed by 
enhanced chemiluminescence. 

amide gel electrophoresis (SDS-PAGE). transferred to 
Immobilon P membranes, and incubated for 90 minutes 
with the indicated antibodies and for 60 minutes with 
secondary antibodies against rabbit IgG. Visualization 
was performed by enhanced chemiluminescence. 



Immunohistochemistry 

Specific rabbit anti-human Id-1 (C-20). ld-2 (C-20). and 
Id-3 (C-20; all from Santa Cruz Biotechnology. Santa 
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Figure 4. Normal and cancerous pancreatic tissues were subjected to immu- 
nostaining using highly specific anti-Id-l (A-C), ami-Id-2 (D-F), and anti-Id-3 
(G-l) antibodies as described in the Methods seaion. Moderate to strong ld-1 
immunoreaaivity was present in the cytoplasm of dua-Iike cancer cells (A 
and C, left panel). In the normal pancreas there was weak Id- 1 immunore- 
aaivity in the ductal cells (B). Preabsorption with the Id- 1 -specific blocking 
peptide abolished the Id-1 immunoreactivity (C, right panel). Strong Id-2 
immunoreaaivity was observed in the cytoplasm of the cancer cells that 
exhibited duel-like structures (D and F, left panel), whereas in the normal 
pancreas, there was only weak Id-2 immunoreaaivity in the ductal cells (E). 
Preal>sorption with the Id-2-specific blocking peptide abolished the Id-2 
immunoreaaivity (F, right panel). Moderate to strong Id-3 immunoreaaivity 
was present in the dua-like cancer cells (G and I, left panel). Moderate to 
strong Id-3 immunoreactivity was also present in the ductal cells of normal 
pancreatic tissue samples (H). Id-3 immunoreactivity was completely abol- 
ished by preabsorption with the Id-3 specific blocking peptide (I, right 
panel). A, D, and G constitute serial seaions of a pancreatic cancer sample, 
revealing coexpression of the three Id proteins. Scale bars, 25 /im. 

Cruz, CA) polyclonal antibodies were used for immunhis- 
tochemistry. These affinity-purified rabbit polyclonal anti- 
bodies specifically react with ld-1 , Id-2. and Id-3, respec- 
tively, of human origin, as determined by Western 
blotting. Paraffin-embedded sections (4 fxcn) were sub- 
jected to immunostaining using the streptavidin-peroxi- 
dase technique. Where indicated, Immunostaining for all 
three Id proteins was performed on serial sections. En- 



dogenous peroxidase activity was blocked by incubation 
for 30 minutes with 0.3% hydrogen peroxide in methanol. 
Tissue sections were incubated for 15 minutes (23°C) 
with 10% normal goat serum and then, incubated for 16 
hours at 4X with the indicated antibodies in PBS con- 
taining 1% bovine serum albumin. Bound antibodies 
were detected with biotinylated goat anti-rabbit IgG sec- 
ondary antibodies and streptavidin-peroxidase complex, 
using diaminobenzidine tetrahydrochloride as the sub- 
strate. Sections were counterstained with Mayer's hema- 
toxylin. Preabsorption with ld-1-. Id-2-. or ld-3-specific 
blocking peptides completely abolished immunoreactiv- 
ity of the respective primary antibody. The immunohisto- 
chemical results were semiquantitatively analyzed as de- 
scribed previously.^^'^° The percentage of positive 
cancer cells was stratified into four groups: 0. no cancer 
cells exhibiting immunoreactivity: 1 , <33% of the cancer 
cells exhibiting immunoreactivity; 2, 33 to 67% of the 
cancer cells exhibiting immunoreactivity; 3 >67% of the 
cancer cells exhibiting immunoreactivity. The Intensity of 
the immunohistochemical signal was also stratified into 
four groups: 0, no immunoreactivity; 1. weak immunore- 
activity; 2. moderate immunoreactivity; 3, strong immu- 
noreactivity. Finally, the sum of the results of the cell 
score and the intensity score was calculated. Statistical 
analysis was performed with SigmaStat software. The 
rank sum test was used, and P < 0,05 was taken as the 
level of significance. 



Results 

Northern blot analysis of total RNA isolated from 12 nor- 
mal pancreatic tissues and 16 pancreatic cancers re- 
vealed the presence of the 1 .2-kb ld-1 transcript and the 
1.6-kb Id2 mRNA transcript in 11 of the 12 normal pan- 
creatic samples, and the 1.3-kb ld-3 mRNA transcript in 
all normal pancreatic samples (Figure 1A, 2). In the can- 
cer tissues, ld-1 mRNA levels were elevated in 8 of 16 
samples. Id-2 mRNA levels were elevated in 9 of these 
samples, and ld-3 mRNA levels were elevated in 6 of 
these samples (Figure 1A. 2). Concomitant overexpres- 
sion of all three Id species was observed in 6 of the 
cancer samples (38%). In contrast, none of the Id mRNA 
species were overexpressed in CP by comparison with 
normal controls (Figure IB. 2). Densitometric analysis of 
all of the autoradiograms indicated that there was a 6.5- 
fold increase (P < 0.01) In ld-1 mRNA levels, a fivefold 
increase (P < 0.01) In Id-2 mRNA levels, and a twofold 
increase (P = 0.027) in ld-3 mRNA levels in the pancre- 
atic cancer samples in comparison to normal controls 
(Figure 2). In contrast, there was no statistically signifi- 
cant difference in the expression levels of ld-1, ld-2, and 
ld-3. in CP tissues in comparison to the corresponding 
levels in the normal pancreas (Figure 2). 

Next, we assessed the expression of the three Id 
genes in 5 human pancreatic cancer cell lines by North- 
ern and Western blot analyses, ld-1 mRNA was present 
at varying levels In all 5 cell lines (Figure 3). ASPC-1, 
CAPAN-1, MlA-PaCa-2, and PANC-1 expressed moder- 
ate to high levels of ld-1 mRNA, whereas COLO-357 cells 
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Figure 5. Immunohistochemistry of pancreatic cancer and dysplastic ducts in CP tissues. In the pancreatic cancer tissues (A-C) there was moderate to strong Id-1 
(A), Id-2 (B), and Id-3 (C) immunoreaaivity in the ductal cells in the areas adjacent to the cancer cells that exhibited CP-like alterations. Islet cells did not exhibit 
Id immunoreactivity (outlined by solid arrowheads). In the CP samples, moderate to strong Id-1 (D), Id-2 (E), and Id-3 (F) immunoreactivity was present in the 
cytoplasm of epithelial cells forming large dysplastic ducts. Scale bar, 25 ixm. 



expressed relatively low levels of this nnRNA moiety. 
Western blotting with a highly specific anti-Id- 1 antibody 
confirmed the presence of the approximately 14-kd Id-1 
protein in the 4 cell lines that expressed high levels of 
Id-1 mRNA (Figure 3). Furthermore, the three cell lines 
with the highest Id-1 mRNA expression (CAPAN-1. M!A- 
PaCa-2, and PANC-1) also exhibited the highest Id-1 
protein expression. Variable levels of the 1.6-kb Id-2 
mRNA transcript were present in all 5 cell lines. In addi- 
tion, a minor band of approximately 1.2 kb was visible in 
COLO-357 and MlA-PaCa-2 cells. Immunoblot analysis 
with a highly specific anti-ld-2 antibody revealed two 
bands of approximately 16 and 18 kd at relatively high 
levels in all of the cell lines with exception of PANC-1 
cells, in which the 16-kd band was relatively faint (Figure 
3). With the exception of MIA-PaCa-2 cells, there was a 
good correlation between Id-2 mRNA and protein levels 
(Figure 3). Id-3 mRNA was present at high levels in 
MIA-PaCa-2 cells, at moderate levels in COLO-357 cells, 
and at low levels in PANC-1 cells. Id-3 mRNA was not 
detectable in ASPC-1 and CAPAN-1 cells (Figure 3). 
immunoblot analysis with a highly specific anti-ld-3 anti- 
body revealed an approximately 14-kd band that was most 
abundant in MIA-PaCa-2 cells, and was also readily appar- 
ent in COLO-357 and PANC-1 cells. In contrast, only a faint 
Id-3 band was seen in ASPC-1 and CAPAN-1 cells. Thus, 
with the exception of PANC-1 cells, there was a good cor- 
relation between Id-3 mRNA and protein levels. 



To determine the localization of Id-1, id-2. and Id-3. 
immunostaining was carried out using the same highly 
specific anti-Id antibodies. In the pancreatic cancers, 
moderate to strong Id-1 immunoreactivity was present in 
the cancer cells in 9 of 10 randomly selected cancer 
samples. An example of moderate Id-1 immunoreactivity 
is shown in Figure 4A. and of strong immunoreactivity in 
Figure 4C (left panel). In contrast, in the normal pancreas, 
faint id-1 immunoreactivity was present only in the ductal 
cells of pancreatic ducts (Figure 4B. arrowheads), Pre- 
absorption with the Id-1 -specific blocking peptide com- 
pletely abolished the Id-1 immunoreactivity (Figure 4C. 
right panel). The cancer cells also exhibited strong Id-2 
(Figure 4. D and F. left pane!) and moderate to strong Id-3 
immunoreactivity. An example of moderate Id-3 immuno- 
reactivity is shown in Figure 4G. and of strong immuno- 
reactivity in Figure 41 (left panel). In contrast, only faint 
!d-2 immunoreactivity was present in the ductal cells in 
the normal pancreas (Figure 4E). whereas Id-3 immuno- 
reactivity in these cells was more variable and ranged 
from moderate to occasionally strong (Figure 4H). Islet 
cells and acinar cells were always devoid of Id immunore- 
activity. Preabsorption of the respective antibody with the 
blocking peptides specific for Id-2 (Figure 4F. right panel) 
and id-3 (Figure 41, right panel) completely abolished im- 
munoreactivity. Analysis of serial pancreatic cancer sec- 
tions revealed that there was often colocalization of the 
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Figure 6. Immunohistochemistry of atypical papillary epithelium in CP tissues. Serial section analysis of some CP samples revealed the presence of large duct-like 
structures with atypical papillary epithelium. Mild to moderate Id- 1 (A) and Id-2 (B) immunoreaaivity and weak Id-3 (C) immunoreaaivity was present in the 
cytoplasm of the cells forming these large ducts with papillary structures. Some CP samples also exhibited moderate Id-3 immunoreaaivity in these cells (D). Scale 
bar, 25 ptm. 



three Id proteins. An example of serial sections fronn a 
pancreatic cancer tissue is shown in Figure 4. A, D, and G. 

ld-1, Id-2, and Id-3 Immunoreactivity was also present 
at moderate levels in the cytoplasm of ductal cells within 
CP-like areas adjacent to the cancer cells (Figure 5, A-C). 
As in the normal pancreas, islet cells (outlined by arrow- 
heads) did not exhibit Id immunoreactivity. In 4 of 9 CP 
samples, there were foci of ductal cell dysplasia of rela- 
tively large interlobular ducts, aH of which exhibited mod- 
erate to strong ld-1, Id-2, and Id-3 immunoreactivity (Fig- 
ure 5, D-F). Five of 9 CP samples also contained foci of 
large ducts exhibiting atypical papillary epithelium. Serial 
section analysis of one of those CP samples revealed 
mild to moderate ld-1 and Id-2 immunoreactivity and 
weak ld-3 immunoreactivity in the cells of these atypical 
papillary ducts (Figure 6, A-C). In contrast, in some of 
these CP samples, moderate to strong Id-3 immunoreac- 
tivity was also observed (Figure 6D). However, most of 
the ductal cells forming the typical ductular structures of 
CP, such as large interlobular ducts and small proliferat- 
ing ducts, exhibited generally only weak to occasionally 
moderate Id immunoreactivity (data not shown). 



The immunohistochemical data for Id-1, Id-2. and Id-3 
are summarized in Table 1. In the case of ld-1 and Id-2. 
the cancer cells as well as the dysplastic and atypical 
papillary ducts in CP exhibited a significantly higher 
score than the ductal cells in the normal pancreas. In 
contrast, due to the marked variability in Id-3 immuno- 
staining in the normal pancreas, the differences between 
normal and cancer ceils and normal and dysplastic cells 
did not achieve statistical significance. 



Discussion 

Id proteins constitute a family of HLH transcription factors 
that are important regulators of cellular differentiation and 
proliferation.^-^ To date, four members of the human Id 
family have been identified. ""•''^""'^ Their expression is 
enhanced during cellular proliferation and in response to 
mitogenic stimuli, and overexpression of Id genes 
inhibits differentiation and/or enhances proliferation in 
several different cell types. The forced expression 
of ld-1 in mouse small intestinal epithelium results in 
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Table 1. Histological Scoring 







ld-1 


ld-2 


ld-3 


Norma! {n = 6) 
Cancer (/7 = 10) 
CP (n = 9) 


Ductal cells 
Cancer cells 

Typical CP lesions {n = 9) 
Dysplastic ducts (n = 4) 
Atypical papillary ducts (n = 5) 


2.0 ± 0,4 
4.5* ± 0,5 

2.7 ± 0,5 
5.3* ± 0,2 
4.4* ± 0.2 


2.3 ± 0.2 
5.25 -t 0.3 

3.1 ± 0.6 
5.8* ± 0.2 
5.2* ± 0.2 


2.5 ± 0.9 
4.5 ± 0.6 
3.4 ± 0.7 
5.3 ± 0.4 
5.0 ± 0.4 



Scoring of the histological specimens was performed as described in the Patients and Methods section. Values are the means ± SO of the number 
of samples indicated in parenthesis. P values are based on comparisons with the respective controls In the normal samples. 
\P< 0.02; *P < 0.01; *P = 0.004; §P = 0.001. 



adenoma formation in these animals.^^ The growth-pro- 
moting effects of Id genes are thought to occur through 
several mechanisms. For example, ld-2 can bind to mem- 
bers of the pRB tumor suppressor family, thus blocking 
their growth-suppressing activity,^^'^^ and ld-1 and ld-2 
can antagonize the bHLH-mediated activation of known 
inhibitors of cell cycle progression such as the cyclin- 
dependent kinase inhibitor p21.^^ 

In the present study, we determined by Northern blot 
analysis that a significant percentage of human pancre- 
atic cancers expressed increased ld-1, ld-2, and ld-3 
mRNA levels. Increased expression was most evident for 
ld-1 (6.5-fold) and ld-2 (fivefold). In contrast, ld-3 mRNA 
levels were only twofold increased in the cancer samples, 
partly because this mRNA was present at relatively high 
levels in the normal pancreas. Immunhistochemical anal- 
ysis confirmed the presence of ld-1, ld-2, and ld-3 in the 
cancer cells within the tumor mass, whereas in the normal 
pancreas faint ld-1 and ld-2 immunoreactivity and mod- 
erate to occasionally strong ld-3 immunoreactivity was 
present in some ductal cells. Pancreatic acinar and islet 
cells in the normal pancreas were devoid of ld-1, ld-2. 
and ld-3 immunoreactivity. In the cancer samples, all 
three Id proteins often colocalized in the cancer cells. 
Coexpression of all three Id genes was also observed in 
cultured pancreatic cancer cell lines, which often exhib- 
ited a close correlation between Id mRNA and protein 
expression. However, in MIA-PaCa-2 there was a diver- 
gence of ld-2 mRNA and protein levels, and in PANC-1 
cells, ld-3 mRNA levels did not correlate well with ld-3 
protein expression. These observations suggest that in 
these cells, the half-life of either Id mRNA or Id protein 
may be altered by comparison with the other cell lines. 
Interestingly, ld-2 immunoblotting revealed two closely 
spaced bands of approximately 16 and 18 kd in 4 of 5 
cell lines. In view of the fact that two possible initiation 
codons have been reported for the ld-2 gene.^® our 
observation raises the possibility that the two ld-2-immu- 
noreactive bands may represent separate translation 
products of the ld-2 gene. 

Pancreatic cancers often harbor p53 tumor suppressor 
gene mutations^^ and exhibit alterations in apoptosis 
pathways. ThuS; these cancers often exhibit increased 
expression of anti-apoptotic proteins such as Bcl-2^® and 
abnormal resistance to Fas-ligand-mediated apopto- 
sis.^^ It has been shown recently that forced constitutive 
expression of Id genes together with the expression of 
anti-apoptotic genes such as Bcl-2 or BcIXl can result In 



malignant transformation of human fibroblasts," raising 
the possibility that the enhanced Id expression in pan- 
creatic cancers together with increased expression of 
anti-apoptotic genes may contribute to the malignant 
potential of pancreatic cancer cells in vivo. 

In the CP tissues there was no significant increase in 
ld-1, ld-2, and ld-3 mRNA levels in comparison to the 
normal pancreas. Immunohistochemical analysis of pan- 
creatic cancer samples revealed colocalization of weak 
to moderate ld-1. ld-2, and ld-3 immunoreactivity in pro- 
liferating ductal cells in the CP-like regions adjacent to 
the cancer cells, indicating that Id expression was not 
restricted to the cancer cells. Similarly, analysis of CP 
samples indicated weak ld-1, ld-2, and ld-3 immunore- 
activity in the cells of small proliferating ducts and large 
ducts without dysplastic changes. In general, there was a 
correlation between weak immunoreactivity and low Id 
mRNA levels. However, in samples that harbored large 
ducts with papillary structures there was moderate Id 
immunoreactivity, and in the cells forming dysplastic 
ducts there was moderate to strong Id immunoreactivity. 
In these CP samples. Id mRNA levels were relatively 
higher than in the CP samples that were devoid of these 
histological changes. Overall, however, increased Id ex- 
pression, most notably of ld-1 and ld-2. distinguished a 
subgroup of pancreatic cancers from CP (Table 1). 

Epidemiological studies have shown that the risk of 
developing pancreatic cancer is increased up to 16-fold 
in patients with pre-existing CP in comparison to the 
general population.'*^ The mechanisms that contribute to 
neoplastic transformation in CP are not known. Although 
there is no established tumor progression model for pan- 
creatic cancer, such as the adenoma-carcinoma se- 
quence of colorectal carcinoma/'' it is generally ac- 
cepted that K-ras and pi 6 mutations occur relatively 
early in pancreatic carcinogenesis, whereas p53 muta- 
tions occur late in this process. Increased Id ex- 
pression may contribute to malignant transformation of 
cultured cell lines in vitro^^ and has been linked to cell 
invasion in a murine mammary epithelial cell line."*"* In 
view of the current findings that ld-1, ld-2. and ld-3 are 
overexpressed in pancreatic cancer and in dysplastic/ 
metaplastic ducts in CP. these observations raise the 
possibility that elevated levels of ld-1, ld-2, and. to a 
lesser extent, ld-3 may represent relatively early markers 
of pancreatic malignant transformation and may contrib- 
ute to the pathobiology of pancreatic cancer. 
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In this study, we examined yeast proteins by two-dimensional (2D) gel electrophoresis and gathered quan- 
titative information from about 1,400 spots. We found that there is an enormous range of protein abundance 
and, for identified spots, a good correlation between protein abundance, mRNA abundance, and codon bias. 
For each molecule of well-translated mRNA, there were about 4,000 molecules of protein. The relative 
abundance of proteins was measured in glucose and ethanol media. Protein turnover was examined and found 
to be insignificant for abundant proteins. Some phosphoproteins were identified. The behavior of proteins in 
differential centrifugation experiments was examined. Such experiments with 2D gels can give a global view of 
the yeast proteome. 



The sequence of the yeast genome has been determined (9). 
More recently, the number of mRNA molecules for each ex- 
pressed gene has been measured (27, 30). The next logical level 
of analysis is that of the expressed set of proteins. We have 
begun to analyze the yeast proteome by using two-dimensional 
(2D) gels. 

2D gel electrophoresis separates proteins according to iso- 
electric point in one dimension and molecular weight in the 
other dimension (21), allowing resolution of thousands of pro- 
teins on a single gel. Although modern imaging and computing 
techniques can extract quantitative data for each of the spots in 
a 2D gel, there are only a few cases in which quantitative data 
have been gathered from 2D gels. 2D gel electrophoresis is 
almost unique in its ability to examine biological responses 
over thousands of proteins simultaneously and should there- 
fore allow us a relatively comprehensive view of cellular me- 
tabolism. 

We and others have worked toward assembling a yeast pro- 
tein database consisting of a collection of identified spots in 2D 
gels and of data on each of these spots under various condi- 
tions (2, 7, 8, 10, 23, 25). These data could then be used in 
analyzing a protein or a metabolic process. Saccharomyces 
cerevisiae is a good organism for this approach since it has a 
well-understood physiology as well as a large number of mu- 
tants, and its genome has been sequenced. Given the sequence 
and the relative lack of introns in S. cerevisiae , it is easy to 
predict the sequence of the primary protein product of most 
genes. This aids tremendously in identifying these proteins on 
2D gels. 

There are three pillars on which such a database rests: (i) 
visualization of many protein spots simultaneously, (ii) quan- 
tification of the protein in each spot, and (iii) identification of 
the gene product for each spot. Our first efforts at visualization 
and identification for S, cerevisiae have been described else- 
where (7, 8). Here we describe quantitative data for these 
proteins under a variety of experimental conditions. 

MATERIALS AND METHODS 
Strains and media. 5. cerevisiae W303 (MATa adel-l his3-ll,25 Uu2-3, 212 
trpl-l uraJ'l can! -100) was used (26). -Met YNB (yeast nitrogen base) medium 
was 1.7 g of YNB (Difco) per liter, 5 g of ammonium sulfate per liter, and 
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adenine, uracil, and all amino adds except methionine; -Met -Cys YNB me- 
dium was the same but without methionine or cysteine. Medium was supple- 
mented with 2% glucose (for most experiments) or with 2% ethanol (for ethanol 
experiments). Low-phosphate YEPD was described by Warner (28). 

Isotopic labeling of yeast and preparation of cell extracts. Yeast strains were 
labeled and proteins were extracted as described by Garrels et al. (7, 8). Briefly, 
ceils were grown to 5 x 10* cells per ml. at 30**C; 1 ml of culture was transferred 
to a fresh tube, and 0.3 mCi of p^SJmethionine (e.g., Express protein labeling 
mix; New England Nuclear) was added to this 1-mI culture. The cells were 
incubated for a further 10 to 15 min and then transferred to a 1.5-ml microcen- 
trifuge tut>e, chilled on ice, and harvested by centrifugation. The supernatant was 
removed, and the cell pellet was resuspended in 100 p.1 of lysis buffer (20 mM 
Tris-HQ (pH 7.6], 10 mM NaF, 10 mM sodium pyrophosphate, 0.5 mM EDTA, 
0,1% deoxycholate; just before use, phenyl raethylsulfony I fluoride was added to 
1 mM, leupeptin was added to 1 fi&^ml, pepstatin was added to 1 ^.g/ml, tosyl- 
sulfonyl phenylalanyl chloromethyl ketone was added to 10 (ig/ml, and soybean 
trypsin inhibitor was added to 10 p.g/ml). 

The resuspended cells were transferred to a screw-cap L5-ml polypropylene 
tube containing 0.28 g of glass beads (0,5 -mm diameter; Biospec Products) or 
0.40 g of zirconia beads (0.5 -mm diameter; Biospec Products). After the cap was 
secured, the tube was inserted into a MiniBeadbeater 8 (Biospec Products) and 
shaken at medium high speed at 4*C for 1 min. Breakage was typically 75%, 
Tubes were then spun in a microcentrifuge for 10 s at 5,000 X ^ at 4**C. 

With a very fine pipette tip, liquid was withdrawn from the beads and trans- 
ferred to a prechillcd 1.5-ml tube containing 7 jtl of DNase I (0.5 mg/ml; Cooper 
product no. 6330)-RNase A (0.25 mg/ml; Cooper product no. 5679)-Mg (50 mM 
MgQ2) mix. Typically 70 nl of liquid was recovered. The mixture was incubated 
on ice for 10 min to allow the RNase and DNase to work. 

Next, 75 jlI of 2x dSDS (2x dSDS is 0.6% sodium dodecyl sulfate [SDS], 2% 
mercaptoethanol, and 0.1 M Tris-HCI [pH 8]) was added. The tube was plunged 
into boiling water, incubated for 1 min, and then plunged into ice. After cooling, 
the tube was centrifuged at 4**C for 3 min at 14,000 x g. The supernatant was 
transferred to a fresh tube and frozen at -70"C. About 5 p,l of this supernatant 
was used for each 2D gel. 

2D polyacrylamide gels. 2D gels were made and run as described elsewhere 
(6-8). 

Image analysis of the gels. The Quest II software system was used for quan- 
titative image analysis (20, 22). Two techniques were used to collect quantitative 
data for analysis by Quest II software. First, before the advent of phosphorim- 
agcrs, gels were dried and fluorographed. Each gel was exposed to film for three 
different times (typically I day, 2 weeks, and 6 weeks) to increase the dynamic 
range of the data. The films were scanned along with calibration strips to relate 
film optical density to disintegrations per minute in the gels and analyzed by the 
software to obtain a linear relationship t>etween disintegrations per minute in the 
spots and optical densities of the film images. The quantitative data are ex- 
pressed as parts per million of the total cellular protein. This value is calculated 
from the disintegrations per minute of the sample loaded onto the gel and by 
comparing the film density of each data spot with density of the film over the 
calibration strips of known radioactivity exposed to the same film. This yields the 
disintegrations per minute per millimeter for each spot on the gel and thence its 
parts-per-minute value. 

After the advent of phosphorimaging, gels bearing ■'^S-labeled proteins were 
exposed to phosphorimager screens and scanned by a Fuji phosphorimager, 
typically for two exposures per gel. Calibration strips of known radioactivity were 
exposed simultaneously. Scan data from the phosphorimager was assimilated by 
Quest II software, and quantitative data were recorded for the spots on the gels. 
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Measurements of protein turnover. Cells in exponential phase were pulse- 
labeled with [^^S]methionine, excess cold Met and Cys were added, and samples 
of equal volume were taken from the culture at intervals up to 90 min (in one 
experiment) or up to 160 min (in a second experiment). Incorporation of ^^S into 
protein was essentially 100% by the first sample (10 min), detracts were made, 
and equal fractions of the samples were loaded on 2D gels (i.e., the different 
samples had different amounts of protein but equal amounts of ^^S). Spots were 
quantitated with a phosphorim aging and Quest software. 

The software was queried for spots whose radioactivity decreased through the 
time course. The algorithm examined all data points for all spots, drew a best-fit 
line through the data points, and looked for spots where this line had a statis- 
tically significant negative slope. In one of the experiments, there was one such 
spot. To the eye, this was a minor, unidentified spot seen only in the first two 
samples (10 and 20 min). In the other experiment, the Quest software found no 
spots meeting the criteria. Therefore, we concluded that none of the identified 
spots (and all but one of the visible spots) represented proteins with long 
half-lives. 

Centrifugal fractionation. Cells were labeled, harvested, and broken with glass 
beads by the standard method desaibed above except that no detergent (i.e., no 
deoxycholate) was present in the lysis buffer. The crude lysate was cleared of 
unbroken cells and large debris by centrifugation at 300 x g for 30 s. The 
supernatant of this centrifugation was then spun at 16,000 X g for 10 min to give 
the pellet used for Fig. 6B. The supernatant of the 16,000 x 10-min spin was 
then spun at 100,000 X g for 30 min to give the supernatant used for Fig. 6A. 

Protein abundance calculations. A haploid yeast cell contains about 4 X 10~^^ 
g of protein (1, 15). Assuming a mean protein mass of 50 kDa, there are about 
50 X W molecules of protein per cell. There are about 1.8 methionines per 10 
kDa of protein mass, which implies 4.5 x 10^ molecules of methionine per cell 
(neglecting the small pool of free Met). We measured (i) the counts per minute 
in each spot on the 2D gels, (ii) the total number of counts on each gel (by 
integrating counts over the entire gel), and (iii) the total number of counts 
loaded on the gel (by scintillation counting of the original sample). Hius, we 
know what fraction of the total incorporated radioactivity is present in each spot. 
After correcting for the methionine (and cysteine [see below]) content of each 
protein, we calculated an absolute number of protein molecules based on the 
fraction of radioactivity in each spot and on 50 x 10* total molecules per cell. 

The labeling mixture used contained about one- as much radioactive 
cysteine as radioactive methionine. Therefore, the number of cysteine molecules 
per piotein was also taken into account in calculating the number of molecules 
of protein, but Cys molecules were weighted one-fifth as heavily as Met mole- 
cules. 

mRNA abundance calculations. For estimation of mRNA abundance, we used 
SAGE (serial analysis of gene expression) data (27) and Affymetrix chip hybrid- 
ization data (29a, 30). The mRNA column in Table 1 shows mRNA abundance 
calculated from SAGE data alone. However, the SAGE data came from cells 
growing in YEPD medium, whereas our protein measurements were from cells 
growing in YNB medium. In addition, SAGE data for low-abundance mRNAs 
suffers from statistical variation. Therefore, we also used chip hybridization data 
(29a, 30) for mRNA from dells grown in YNB. These hybridization data also had 
disadvantages. Hrst, the amounts of high-abundance mRNAs were systemati- 
cally underestimated, probably because of saturation in the hybridizations, which 
used 10 fig of cRNA. For example, the abundance oiADHl mRNA was 197 
copies per cell by SAGE but only 32 copies per cell by hybridization, and the 
abundance of EN02 mRNA was 248 copies per cell by SAGE but only 41 by 
hybridization. When the amount of cRNA used in the hybridization was reduced 
to 1 [JLg, the apparent amounts of mRNA were similar to the amounts determined 
by SAGE (29a, 29b). However, experiments using 1 p,g of cRNA have been done 
for only some genes (29a). Because amounts of mRNA were normalized to 
15,000 per cell, and because the amounts of abundant mRNAs were underesti- 
mated, there is a 2.2-fold overestimate of the abundance of nonabundant 
mRNAs. We calculated this factor of 2,2 by adding together the number of 
mRNA molecules from a large number of genes expressed at a low level for both 
SAGE data and hybridization data. The sum for the same genes from hybrid- 
ization data is 2.2-fold greater than that from SAGE data. 

To take into account these difficulties, we compiled a list of "adjusted" mRNA 
abundance as follows. For all high -abundance mRNAs of our identified proteins, 
we used SAGE data. For all of these particular mRNAs, chip hybridization 
suggested that mRNA abundance was the same in YEPD and YNB media. For 
medium-abundance mRNAs, SAGE data were used, but when hybridization 
data showed a significant difference between YEPD and YNB, then the SAGE 
data were adjusted by the appropriate factor. Finally, for low-abundance 
mRNAs, we used data from chip hybridizations from YNB medium but divided 
by 2.2 to normalize to the SAGE results. These calculations were completed 
without reference to protein abundance. 

CAI. The codon adaptation index (CAI) was taken from the yeast proteome 
database (YPD) (13), for which calculations were made according to Sharp and 
Li (24). Briefly, the index uses a reference set of highly expressed genes to assign 
a value to each codon, and then a score for a gene is calculated from the 
frequency of use of the various codons in that gene (24). 

Statistical analysis. The JMP program was used with the aid of T, Tully. The 
JMP program showed that neither mRNA nor protein abundances were nor- 
mally distributed; therefore. Spearman rank correlation coefiicients (r^) were 



calculated. The mRNA (adjusted and unadjusted) and protein data were also 
transformed so that Pearson product-moment correlation coefficients (r^) could 
be calculated. First, this was done by a Box-Cox transformation of log-trans- 
formed data. This transformation produced normal distributions, and an of 
0.76 was achieved. However, because the Box-Cox transformation is complex, we 
also did a simpler logarithmic transformation. This produced a normal distribu- 
tion for the protein data. However, the distribution for the mRNA and adjusted 
mRNA data was close to, but not quite, normal. Nevertheless, we calculated the 
and found that it was 0,76, identical to the coefficient from the Box-Cox 
transformed data. We therefore believe that this correlation coefficient is not 
misleading, despite the fact that the log(mRNA) distribution is not quite normal. 



RESULTS 

Visualization of 1,400 spots on three gel systems. Yeast 
proteins have isoelectric points ranging from 3.1 to 12,8, and 
masses ranging from less than 10 kDa to 470 kDa. It is difficult 
to examine all proteins on a single kind of gel, because a gel 
with the needed range in pi and mass would give poor resolu- 
tion of the thousands of spots in the central region of the gel. 
Therefore, we have used three gel systems: (i) pH "4 to 8" with 
10% polyacrylamide; (ii) pH "3 to 10" with 10% polyacryl- 
amide; and (iii) nonequilibrium with 15% polyacrylamide (7, 
8). Each gel system allows good resolution of a subset of yeast 
proteins. 

Figure 1 shows a pH 4-8, 10% polyacrylamide gel. The pH 
at the basic end of the isoelectric focusing gel cannot be main- 
tained throughout focusing, and so the proteins resolved on 
such gels have isoelectric points between pH 4 and pH 6.7. For 
these pH 4-8 gels, we see 600 to 900 spots on the best gels after 
multiple exposures. 

The pH 3-10 gels (not shown) extend the pi range somewhat 
beyond pH 7.5, allowing detection of several hundred addi- 
tional spots. Finally, we use nonequilibrium gels with 15% 
acrylamide in the second dimension. These allow visualization 
of about 100 very basic proteins and about 170 small proteins 
(less than 20 kDa). In total, using all three gel systems, about 
1,400 spots can be seen. These represent about 1,200 different 
proteins, which is about one-quarter to one-third of the pro- 
teins expressed under these conditions (27, 30). Here, we focus 
on the proteins seen on the pH 4-8 gels. 

Although nearly all expressed proteins are present on these 
gels, the number seen is limited by a problem we call coverage. 
Since there are thousands of proteins on each gel, many pro- 
teins comigrate or nearly comigrate. When two proteins are 
resolved, but are close together, and one protein spot is much 
more intense than the other, a problem arises in visualizing the 
weaker spot: at long exposures when the weak signal is strong 
enough for detection, the signal from the strong spot spreads 
and covers the signal from the weaker spot. Thus, weak spots 
can be seen only when they are well separated from strong 
spots. 

For a given gel, the number of detectable spots initially rises 
with exposure time. However, beyond an optimal exposure, the 
number of distinguishable spots begins to decrease, because 
signals from strong spots cover signals from nearby weak spots. 
At long exposures, the whole autoradiogram turns black. Thus, 
there is an optimum exposure yielding the maximum number 
of spots, and at this exposure the weakest spots are not seen. 

Largely because of the problem of coverage, the proteins 
seen are strongly biased toward abundant proteins. All identi- 
fied proteins have a CAI of 0.18 or more, and we have iden- 
tified no transcription factors or protein kinases, which are 
nonabundant proteins. Thus, this technology is useful for ex- 
amining protein synthesis, amino acid metabolism, and glyco- 
lysis but not for examining transcription, DNA replication, or 
the cell cycle. 





FIG. 1. 2D gels. The horizontal axis is the isoelectric focusing dimension, which stretches from pH 6.7 (left) to pH 43 (right). The vertical axis is the polyaciylamide 
gel dimension, which stretches from about 15 kba (bottom) to at least 130 kDa (top). For panel A, extract was made from cells in log phase in glucose; for panel B, 
cells were grown in ethanol. The spots labeled 1 through 6 are unidentified proteins highly induced in ethanol 
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Spot identification. The identification of various spots has 
been described elsewhere (7, 8). At present, 169 different spots 
representing 148 proteins have been identified. Many of these 
spots have been independently identified (2, 10, 23, 25). Hie 
main methods used in spot identification have been analysis of 
amino acid composition, gene overexpression, peptide se- 
quencing, and mass spectrometry. 

Pulse-chase experiments and protein turnover. Pulse-chase 
experiments were done to measure protein half-lives (Materi- 
als and Methods). Cells were labeled with p^S]methionine for 
10 min, and then an excess of unlabeled methionine was added. 
Samples were taken at 0, 10, 20, 30, 60, and 90 min after the 
beginning of the chase. Equal amounts of ^^S were loaded from 
each sample; 2D gels were run, and spots were quantitated. 
Surprisingly, almost every spot was nearly constant in amount 
of radioactivity over the entire time course (not shown). A few 
spots shifted from one position to another because of post- 
translational modifications (e.g., phosphorylation of RpaO and 
Efbl). Thus, the proteins being visualized are all or nearly all 
very stable proteins, with half-lives of more than 90 min. Gygi 
et al. (10) have come to a similar conclusion by using the N-end 
rule to predict protein half-lives. This result does not imply 
that all yeast proteins are stable. The proteins being visualized 
are abundant proteins; this is partly because they are stable 
proteins. 

Protein quantitation* Because all of the proteins seen had 
effectively the same half-life, the abundance of each protein 
was directly proportional to the amount of radioactivity incor- 
porated during labeling. Thus, after taking into account the 
total number of protein molecules per cell, the average content 
of methionine and cysteine, and the methionine and cysteine 
content of each identified protein, we could calculate the abun- 
dance of each identified protein (Tables 1 and 2; Materials and 
Methods). About 1,000 unidentified proteins were also quan- 
tified, assuming an average content of Met and Cys. 

Many proteins give multiple spots (7, 8). The contribution 
from each spot was summed to give the total protein amount. 
However, many proteins probably have minor spots that we are 
not aware of, causing the amount of protein to be underesti- 
mated. 

When the proteins on a pH 4-8 gel were ordered by abun- 
dance, the most abundant protein had 8,904 ppm, the 10th 
most abundant had 2,842 ppm, the 100th most abundant had 
314 ppm, the 500th most abundant had 57 ppm, and the 
1,000th most abundant (visualized at greater than optimum 
exposure) had 23 ppm. Thus, there is more than a 300-fold 
range in abundance among the visualized proteins. The most 
abundant 10 proteins account for about 25% of the total pro- 
tein on the pH 4-8 gel, the most abundant 60 proteins account 
for 50%, and the most abundant 500 proteins account for 80%. 
Since it seems likely that the pH 4-^ gels give a representative 
sampling of all proteins, we estimate that half of the total 
cellular protein is accounted for by fewer than 100 different 
gene products, principally glycolytic enzymes and proteins in- 
volved in protein synthesis- 
Correlation of protein abundance with mRNA abundance. 
Estimates of mRNA abundance for each gene have been made 
by SAGE (27) and by hybridization of cRNA to oligonucleo- 
tide arrays (30). These two methods give broadly similar re- 
sults, yet each method has strengths and weaknesses (Materials 
and Methods). Table 1 lists the number of molecules of mRNA 
per cell for each gene studied. One measurement (mRNA) 
uses data from SAGE analysis alone (27); a second incorpo- 
rates data from both SAGE and hybridization (30) (adjusted 
mRNA) (Table 1; Materials and Methods). We correlated 
protein abundance with mRNA abundance (Fig; 2). For ad- 



justed mRNA versus protein, the Spearman rank correlation 
coefficient, r^, was 0.74 (P < 0.0001), and the Pearson corre- 
lation coefficient, r^, on log transformed data (Materials and 
Methods) was 0.76 (P < 0.00001). We obtained similar corre- 
lations for mRNA versus protein and also for other data trans- 
formations (Materials and Methods). Thus, several statistical 
methods show a strong and significant correlation between 
mRNA abundance and protein abundance. Of course, the cor- 
relation is far from perfect; for mRNAs of a given abundance, 
there is at least a 10-fold range of protein abundance (Fig. 2). 
Some of this scatter is probably due to posttranscriptional 
regulation, and some is due to errors in the mRNA or protein 
data. For example, the protein YeD runs poorly on our gels, 
giving multiple smeared spots. Its abundance has probably 
been underestimated, partly explaining the low protein/mRNA 
ratio of Yef3. It is the most extreme outlier in Fig. 2. 

These data on mRNA (27, 30) and protein abundance (Ta- 
ble 1) suggest that for each mRNA molecule, there are on 
average 4,000 molecules of the cognate protein. For instance, 
for Actl (actin) there are about 54 molecules of mRNA per 
cell and about 205,000 molecules of protein. Assuming an 
mRNA half-life of 30 min (12) and a cell doubling time of 120 
min, this suggests that an individual molecule of mRNA might 
be translated roughly 1,000 times. These calculations are lim- 
ited to mRNAs for abundant proteins, which are likely to be 
the mRNAs that are translated best. 

A full complement of cell protein is synthesized in about 120 
min under these conditions. Thus, 4,000 molecules of protein 
per molecule of mRNA implies that translation initiates on an 
mRNA about once every 2 s. This is a remarkably high rate; it 
implies that if an average mRNA bears 10 ribosomes engaged 
in translation, then each ribosome completes translation in 
20 s; if an average protein has 450 residues; this in turn implies 
translation of over 20 amino acids per s, a rate considerably 
higher than estimated for mammalians (3 to 8 amino acids per 
s) (18). These estimates depend on the amount of mRNA per 
cell (11, 27). 

The large number of protein molecules that can be made 
from a single mRNA raises the issue of how abundance is 
controlled for less abundant proteins. Many nonabundant pro- 
teins may be unstable, and this would reduce the protein/ 
mRNA ratio. In addition, many nonabundant proteins may be 
translated at suboptimal rates. We have found that mRNAs for 
nonabundant proteins usually have suboptimal contexts for 
translational initiation. For example, there are over 600 yeast 
genes which probably have short open reading frames in the 
mRNA upstream of the main open reading frame (17a). These 
may be devices for reducing the amount of protein made from 
a molecule of mRNA. 

Correlation of codon bias with protein abundance. The 
mRNAs for highly expressed proteins preferentially use some 
codons rather than others specifying the same amino acid (14). 
This preference is called codon bias. The codons preferred are 
those for which the tRNAs are present in the greatest amounts. 
Use of these codons may make translation faster or more 
efficient and may decrease misincorporation. These effects are 
most important for the cell for abundant proteins, and so 
codon bias is most extreme for abundant proteins. The effect 
can be dramatic — highly biased mRNAs may use only 25 of the 
61 codons. 

We asked whether the correlation of codon bias with abun- 
dance continues for medium-abundance proteins. There are 
various mathematical expressions quantifying codon bias; here, 
we have used the CAI (24) (Materials and Methods) because 
it gives a result betweeq 0 and 1. The for CAI versus protein 
abundance is 0.80 (P < 0.0001), similar to the mRNA-protein 









TABLE 1. 


Quantitative data" 








Function 


Name 


CAI 


mRNA 


Adjusted mRNA 


Protein (Glu) (10^) 


Protein (Eth) (10^) 


E/G ratio 


Carbohydrate metabolism 


Adhl 


0.810 


197 


197 


1,230 


972 


0.79 


Adh2 


0.504 


0 




0 


963 


>20 




Cit2 


0.185 


1 


2.8 


23 


288 


12 




Enol 


0.870 






410 


974 


2.4 




Eno2 


0.892 


248 


248 


650 


215 


0.33 




Fbal 


0.868 


179 


179 


640 


608 


0.95 




Hxkl,2 


0.500 


13 


105 


62 


46 






Icll 


0.251 


0 




0 


671 


>20 




Pdbl 


0,342 


5 


5 


41 


33 






Pdcl 


0.903 


226 


226 


280 


205 


0.73 




Pfkl 


0.465 


5 


5 


75 


53 


0,71 




Pgil 


0.681 


14 


14 


160 


120 


0.75 




Pvrl 




1 


0 7 


37 


34 






Tall 


0.579 


5 


5' 


110 


35 






Tdh2 


0.904 


63 


63 


430 


876 


NR 




Tdh3 


0.924 


460 


460 


1,670 


1,927 


NR 




Tpil 


0.817 


NoNla 




No Met 


No Met 




Protein synthesis 


Efbl 


0.762 


33 


16.5 


358 


3oz 


0.55 


Eft 1,2 


0.801 


26 


26 


99 


54 




Prtl' 


0303 


4 


0.7 


12 


6 


0.36 




RpaO 


0.793 


246 


246 


277 


100 




Tifl,2 


0,752 


29 


29 


233 


106 


0,46 




Yef3 


0.777 


36 


36 


14 


ND 




Heat shock 


Hsc82 


0.581 


2 


2.9 


112 


75 


0.67 




Hsp60 


0.381 


9 


, 2.3 


35 


82 


2.3 




Hsp82 


0.517 


2 


1.3 


52 


135 


z.o 




Hspl04 


0.304 


7 


7 


70 


161 


2,3 




Kar2 


0.439 


5 


10.1 


43 


102 


2.4 




Ssal 


0.709 


2 


4,3 


303 


421 


1,4 




Ssa2 


0.802 


10 


5 


213 


324 


1.5 




Ssbl,2 


0^850 


50 


50 


270 


85 






Sscl 


0.521 


2 


2.6 


68 


80 


1,2 




Ssel 


0,521 


8 


8 


96 


48 


1,7 




Stil 


0.247 


1 


1.1 


25 


44 


Amino acid synthesis 


Adel 


0.229 


4 


4 


14 


27 




Ade3 


0.276 


2 


1.7 


12 


n 

y 






Ade5,7 


0.257 


2 


1.4 


14 


4 






Arg4 


0.229 


1 


8.1 


41 


41 






Gdhl 


0.585 


10 


27 


148 ^ 




1 1 
l.j 




GInl 


0.524 


11 


11 


77 


1 fiA 

104 




His4 


0.267 


3 


3 


15 


T3 
Z3 


l.J 




Ilv5 


0.801 


6 


6 


152 


1 AO 

luy 


u. / 




Lys9 


0 332 


4 


4 


32 


17 


0.52 




Met6 


0.657 


No Ma 


22 


190 


80 


0.42 




Pro2 


0.248 


3 


3 


30 


12 






Serl 


0.258 


2 


1.2 


15 


8 






Trp5 


0.319 


5 


5 


28 


12 




Miscellaneous 


Actl 


0.710 


54 


54 


205 


lo4 


f\ 7Q 




Adkl 


0.531 


No Ma 




47 


A1 






Ald6 


0.520 


3 


3 


181 


1 <o 
ijy 






Atp2 


0.424 


1 


4.1 


76 


1 no 

luy 


1 A 




Bmhl 


0.322 


46 


46 


191 


1 J/ 


n 77 
u. /z 




Bmh2 


0.384 


1 


1.4 


134 


1 Al 

14/ 






Cdc48 
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Moll 
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0.497 
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Saml 


0.494 
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59 
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Sam2 


0.497 
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15 


63 


20 






Sodl 


0.376 


36 
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Ubal 


0.212 
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2 


14 


20 


0.44 




YKL056 


0.731 


62 


62 


253 


112 




YLR109 


0.549 


21 


21 


930 


40 


0.20 




YMR116 


0.777 


41 


41 


184 



° CAI, a measure of codon bias, is taken from the YPD. mRNA, number of mRNA molecules per cell from SAGE data (27); adjusted mRNA, number of mRNA 
molecules per cell based on both SAGE and chip hybridization (30) (see Materials and Methods); Protein (Glu), number of molecules of protein per cell in 
YNB-glucose; Protein (Eth), number of molecules of protein per cell in YNB-ethanol; E/G ratio, ratio of protein abundance in ethanol to glucose. The E/G ratio is 
not given if it was close to 1 or if it was not repcatable (NR) in multiple gels. Some gene products (e.g., Tifl and Tif2 (Tifl,2]) were difficult to distinguish on either 
a protein or an mRNA basis; these are pooled. No Nla, there was no suitable NlalU site in the 3' region of the gene, and so there are no SAGE mRNA data; No Met, 
the mature gene product contains no methionines, and so there are no reliable protein data. 
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TABLE 2. Functions of proteins listed in Table i 



Name" 



YPD title lines* 



Adhl 

Adh2 

Cit2 

Enol 

Eno2 

Fbal 

Hxkl 

Hxk2 

IcU 

Pdbl 

Pdcl 

Pfkl 

Pgil 

Pycl 

Tall 

Tdh2 

Tdh3 

Tpil 

Efbl 
Eftl 
Eft2 
Prtl 

RpaO (RPPO) 

TiTl 

Tif2 

Yef3 

Hsc82 
Hsp60 
Hsp82 
Hspl04 

Kar2 

Ssal 
Ssa2 
Ssbl 
Ssb2 
Sscl 

Ssel 
Stil 

Adel 

Ade3 

Ade5.7 

Arg4 

Gdhl 

GInl 

His4 

Ilv5 

Lys9 
Met6 

Pro2 
Serl 
Trp5 

Actl 

Adkl 

Ald6 

Atp2 

Bmhl 

Binh2 

Cdc48 

Cdc60 

Erg20 

Gppl (Rhr2) 
Gspl 

Moll (Thi4) 
Pabl 

Psal 

Rnr4 

Saml 

Sam2 

Sodl 

Ubal 

YK1J056 
YLR109 (Ahpl) 
YMR116 (Ascl) 



Alcohol dehydrogenase I; cytoplasmic isozyme reducing acetatdehyde to ethanol, regenerating NAD* 
Alcohol dehydrogenase II; oxidizes ethanol to acctaldehyde, glucose repressed 

Citrate synthase, peroxisomal (nonmitochondrial); converts acetyl-CoA and oxaloacetate to citrate plus CoA 
Enolase 1 r2-phosphoglycerate dehydratase^; converts 2-phospho-D-gIycerate to phosphoenolpyruvate in glycolysis 
Enoiase 2 {2-phosphoglycerate dehydratase); converts 2-phospho-D-gIycerate to phosphoenolpyruvate in glycolysis 
Fructose bispnospnate aldolase II; sixth step in glycolysis 

Hexokinase I; converts hexoses to hexose phosphates in glycolysis; repressed by glucose 

Hexokinase II; converts hexoses to hexose phosphates in glycolysis and plays a regulatory role in glucose repression 
Isocitrate lyase, peroxisomal; carries out part of the glyoxylate cycle; required for gluconeogenesis 
E*yruvate dehydrogenase complex. El beta subunit 
E^ruvate decarboxylase isozyme 1 

Phosphofructokinase alpha subunit, part of a complex with Pfk2p which carries out a key regulatory step in glycolysis 
Glucose-6-phosphate isomerase, converts glucose-6-phosphate to fructose-6-phosphate 
Pyruvate carboxylase 1; converts pyruvate to oxaloacetate for gluconeogenesis 
Transaldolase; component of nonoxidative part of pentose phosphate pathway 

Glyceraldchyde-3-pnosphate dehydrogenase 2; converts D-gJyccraldehyde 3-phosphate to 1,3-dephosphoglycerate 
Glyceraldehyde-3-phosphate dehydrogenase 3; converts D-glyceraldehyde 3-phosphate to 1,3-dephosphoglycerate 
Tnoscphosphate isomerase; interconverts glyceraIdehyde-3-phosphate and dihydroxyacetone phosphate 

Translation elongation factor EF-ip; GDP/GTP exchange factor for TeflpyTef2p 

Translation elongation factor EF-2; contains diphthamide which is not essential for activity; identical to Eft2p 
Translation elongation factor EF-2; contains diphthamide which is not essential for activity; identical to Eftlp 
Translation initiation factor eIF3 beta subunit {p90); has an RNA recognition domain 
Acidic ribosomal protein AO 

Translation initiation factor 4A reIF4A) of the DEAD box family 
Translation initiation factor 4A (eIF4A) of the DEAD box family 
Translation elongation factor EF-3A; member of ATP-binding cassette superfamily 

Chaperoriin homologous to £. coU HtpG and mammalian HSE^O 

Mitochondrial chaperonin that cooperates with HsplOp; homoiog of coli GroEL 

Heat-inducible chaperonin homologous to E. coli HtpG and mammalian HSP90 

Heat shock protein required for induced thermo tolerance and for resolubilizing aggregates of denatured proteins; important for [psi~]- 
to-[PSI"*'] prion conversion 

Heat shock protein of the endoplasmic reticulum lumen required for protein translocation across the endoplasmic reticulum membrane 

and for nuclear fusion; member of the HSP70 family 
Cytoplasmic chaperone; heat shock protein of the HSP70 family 
cytoplasmic chaperone; member of the HSP70 family 
Heat shock protein of HSP70 family involved in the translational apparatus 
Heat shock protein of HSP70 family, cytoplasmic 

Mitochondrial protein that acts as an import motor with Tim44p and plays a chaperonin role in receiving and folding of protein chains 

during import; heat shock protein of HSP70 family 
Heat shock protein of the HSP70 family; multicopy suppressor of mutants with hyperactivated Ras/cyclic AMP pathway 
Stress-induced protein required for optimal growth at high and low temperature; has tetratricopeptide repeats 

Phosphoribosylamidoimidazole-succinocarboxamide synthase: catalyzes the seventh step in de novo purine biosynthesis pathway 
C, tetrahydrofolate synthase ftrifunclional enzyme), cytoplasmic 

Phosphoribosylaminc-glycine ligase plus phosphoribosylformylgiycinamidine cyclo-ligase; bifunctional protein 
Argininosucanate lyase; catalyzes the final step in arginine biosynthesis 

Giutamate dehydrogenase (NADP*); combines ammonia and a-ketoglutarate to form glutamate 
Glutamine synthetase; combines ammonia to glutamate in ATP-driven reaction 

Phosphoribosyl-AMP cyclohydrolase/phosphoribosyl-ATP pyrophosphohydrolase/histidinol dehydrogenase; 2nd, 3rd, and 10th steps of 
his biosynthesis pathway 

Ketol-aci<3 reductoisomerase (ace to hydroxy, acid red ucto isomerase) (alpha-keto-p-hydroxylacyl) reductoisomerase); second step in Val 
and Ilv biosynthesis pathway 

Saccharopine dehydrogenase (NADP^, L-glutamate forming) (saccharopine reductase), seventh step in lysine biosynthesis pathway 
Homocysteine methyl transferase; (5-methyltetrahydropteroyl triglutamate-homocysteine methyltransf erase), methionine synthase, 
cobalamin independent 

^-Glutamyl phosphate reductase (phosphoglutamate dehydrogenase), proline biosynthetic enzyme 
Phosphoserine transaminase; invorved in synthesis of serine from 3-phosphogIycerate 
Tryptophan synthase, last (5th) step in tryptophan biosynthesis pathway 

Actin; involved in cell polarization, endocytosis, and other cytoskeletal functions 
Adenylate kinase (GTP:AMP phosphotransferase), cytoplasmic 
Cytosolic acctaldehyde dehydrogenase 

Beta subunit of Fl-ATP synthase; 3 copies are found in each Fl oligomer 
Homoiog of mammalian 14-3-3 protein; has strong similarity to Bmh2p 
Homoiog of mammalian 14-3-3 protein; has strong similarity to Bmhlp 

Protein of the AAA family of ATPases; required for cell division and homotypic membrane fusion 
Leucyl-tRNA synthetase, cytoplasmic 

Famcsyl pyrophosphate synthetase; may be rate-limiting step in sterol biosynthesis pathway 
DL-GlyceroI phosphate phosphatase 

Ran, a GTP-bindmg protein of the Ras superfamily involved in trafficking through nuclear pores 
Inorganic pyrophosphatase, cytoplasmic 

Component of serine C-pal mi toyltransf erase; first step in biosynthesis of long-chain base component of sphingolipids 
Thiamine-repressed protein essential for growth in the absence of thiamine 

Poly(A)-binding protein of cytoplasm and nucleus; part of the 3'-end RNA-processing complex (cleavage factor I); has 4 RNA 

recognition domains 
Mannose-1 -phosphate guanyl transferase; GDP-mannose pyrophosphorylase 
Ribonucleotide reductase small subunit 
5-Adenosylmethionine synthetase 1 
5-Adenosylmethionine synthetase 2 
Copper-zinc superoxide dismutase 
Ubiquitin-activating (El) enzyme 

Resembles translational ly controlled tumor protein of animal cells and higher plants 
Alkyl hydroperoxide reductase 

Abundant protein with effects on translational efficiency and cell size, has two WD (WD-40) repeats 



Accepted name from the Saccharomyces genome database and YPD. Names in parentheses represent recent changes. 
** Courtesy of Proteome, Inc., reprinted with permission. 
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FIG. 2. Correlation of protein abundance with adjusted mRNA abundance. 
The number of molecules per cell of each protein is plotted against the number 
of molecules per cell of the cognate mRNA, with an of 0,76, Note the 
logarithmic axes. Data for mRNA were taken from references 27 and 30 and 
combined as described in Materials and Methods: 



correlation, confirming a strong correlation between CAI and 
protein abundance (Fig. 3). The relationship between CAI and 
protein abundance is log linear from about 1,000,000 to about 
10,000 molecules per cell. We have no data for rarer proteins. 

It is not clear whether CAI reflects maximum or average 
levels of protein expression. The proteins used for the CAI- 
protein correlation included some proteins which were not 
expressed at maximum levels under the condition of the ex- 
periment (Hsc82, Hspl04, Ssal, Adel, Arg4, His4, and others). 
When these proteins were removed from consideration and 
the correlation between CAI and the remaining (presumably 
constitutive) proteins was recalculated, the was essentially 
unchanged (not shown). 

The equation describing the graph in Fig. 3 is log (protein 
molecules/cell) = (2.3 X CAI) + 3.7. Thus, under certain 
conditions (a CAI of 0.3 or greater; a constitutively expressed 
gene), a very rough estimate of protein abundance can be 
made by raising 10 to the power of [(2.3 x CAI) + 3.7]. 

The distribution of CAI over the genome (Fig. 4) consists of 
a lower, bell-shaped distribution, possibly indicating a region 
where there is no selection for codon bias, and an upper, flat 
distribution, starting at a CAI of about 0.3, possibly indicating 
a region where there is selection for codon bias. Almost all of 
the proteins whose abundance we have measured are in the 
upper, flat portion of the distribution. In the lower, bell-shaped 
region, we do not know whether there is a correlation between 
CAI and protein abundance. 

Changes in protein abundance in glucose and ethanol. A 
comparison of ceils grown in glucose (Fig, lA) with cells grown 
in ethanol (Fig. IB) is shown in Table 1. As is well known, 
some proteins are induced tremendously during growth on 
ethanol. Two striking examples are the peroxisomal enzymes 
IcU (isocitrate lyase) and Cit2 (citrate synthase), which are 
induced in ethanol by more than 100- and 12-fold, respectively 
(Fig. 1; Table 1), These enaymes are key components of the 
glyoxylate shunt, which diverts some acetyl coenzyme A 
(acetyl-CoA) from the tricarboxylic acid cycle to gluconeogen- 
esis. 5. cerevisiae requires large amounts of carbohydrate for its 
cell wall; in ethanol medium, this carbohydrate comes from 
gluconeogenesis, which depends on the glyoxylate shunt and 
on the glycolytic pathway running in reverse. The need for 



gluconeogenesis also explains why glycolytic enzymes are 
abundant even in ethanol medium. Thus, 2D gel analysis shows 
the prominence of the glycolytic and glyoxylate shunt enzymes 
in cells grown on ethanol, emphasizing that gluconeogenesis, 
presumably largely for production of the cell wall, is a major 
metabolic activity under these conditions. 

During gluconeogenesis, substrate-product relationships are 
reversed for the glycolytic enzymes. One might expect that not 
all glycolytic enzymes would be well adapted to the reverse 
reaction. Indeed, 2D gels show that in ethanol, Adh2 (alcohol 
dehydrogenase 2) is strongly induced (16), while its isozyme 
Adhl is not greatly affected. Adhl and Adh2 each interconvert 
acetaldehyde and ethanol. Adhl has a relatively high for 
ethanol (17 mM), while Adh2 has a lower (0.8 mM) (5). 
Thus, it is thought that Adhl is specialized for glycolysis (ac- 
etaldehyde to ethanol), while Adh2 is specialized for respira- 
tion (ethanol to acetaldehyde) (5, 29). Similarly, Enol (enolase 
1) is induced in ethanol, while its isozyme Eno2 (enolase 2) 
decreases in abundance (Table 1) (4, 19). Enol is inhibited by 
2-phosphoglycerate (the glycolytic substrate), while Eno2 is 
inhibited by phosphoenolpyruvate (the gluconeogenic sub- 
strate) (4). Perhaps Enol has a lower for phosphoenol- 
pyruvate than does Eno2, though to our knowledge this has not 
been tested. Thus, the 2D gels distinguish isozymes specialized 
for growth on glucose (Adhl and Eno2) from isozymes spe- 
cialized for ethanol (Adh2 and Enol). 

Many heat shock proteins (e.g., Hsp60, Hsp82, Hspl04, and 
Kar2) were about twofold more abundant in ethanol medium 
than in glucose medium. This is consistent with the increased 
heat resistance of cells grown in ethanol (3). 

Enzymes involved in protein synthesis (Eftl, RpaO, and Tifl) 
were about twice as abundant in glucose medium as in ethanol 
medium. This may reflect the higher growth rate of the cells in 
glucose. 

Phosphorylation of proteins. To examine protein phosphor- 
ylation, we labeled cells with ^^P and ran 2D gels to examine 
phosphoproteins. About 300 distinct spots, probably represent- 
ing 150 to 200 proteins, could be seen on pH 4-8 gels (Fig, 5B). 
We then aligned autoradiograms of three gels, each with a 
different kind of labeled protein (^^P only [Fig. 5B], ^^P plus 
^^S [Fig. 5A], and ^^S only [not shown, but see Fig. 1 for 
example]). In this way, we made provisional identification of 
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FIG. 3. Correlation of protein abundance with CAI. The number of mole- 
cules per cell of each protein is plotted against the CAI for that protein. Note the 
logarithmic scale on the protein axis. Data for the CAI are from the YPD 
database (13). 
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FIG. 4. Distribution of CAI over the whole genome, shown in intervals of 0.030 (i.e., there are 150 genes with a CAI between 0.000 and 0.030, inclusive; 31 genes 
with a CAI between 0.031 and 0.060; 269 genes with a CAI between 0.061 and 0.090; 1,296 genes with a CAI between 0.091 and 0.120; etc.). The distribution peaks 
with 2,028 genes with a CAI between 0.121 and 0.150. 



some of the ^^P-Iabeled spots as particular ^^S-Iabeled spots. 
All such identifications are somewhat uncertain, since precise 
alignments are difficult, and of course multiple spots may ex- 
actly comigrate. Nevertheless, we believe that most of the 
provisional identifications are probably correct. Among the 
major ^^P-labeled proteins are the hexokinases Hxkl and 
Hxk2, the acidic ribosome-associated protein RpaO, the trans- 
lation factors YeG and Efbl, and probably Hsp70 heat shock 
proteins of the Ssa and Ssb families. RpaO and Efbl are quan- 
titatively monophosphorylated. 

Many yeast proteins resolve into multiple spots on these 2D 
gels (7). YeG has five or more spots, at least four of which 
comigrate with ^^P, Tpil has a major spot showing no ^^P 
labeling and a minor, more acidic spot which overlaps with 
some ^^P label. Tifl has at least seven spots (7); two of these 
overlap with some ^^P label, but five do not (Fig. 5). Eftl has 
at least three spots (7), and none of these overlap with ^^P, 
although there are three nearby, unidentified ^^P-Iabeled spots 
(a, c, and d in Fig. 5). Spots that seem to be extra forms of 
Met6, Pdcl, Eno2, and Fbal can be seen in Fig. 6A, but there 
is little ^^P at these positions in Fig. 5. Thus, phosphorylation 
explains some but not ail of the different protein isoforms seen. 

TTie cell cycle is regulated in part by phosphorylation. We 
compared ^^P-labeled proteins from cells synchronized in Gj 
with a-factor, in cells synchronized in by depletion of 
cyclins, and in cells synchronized in M phase with nocodazole. 
Only very minor differences were seen, and these were difficult 
to reproduce. The cell cycle proteins regulated by phosphory- 
lation may not be abundant enough for this technique to be 
applied easily. 

Centrifugal fractionation. We fractionated ^S-Iabeled ex- 
tracts by centrifugation (Materials and Methods). Figure 6A 
shows the proteins in the supernatant of a high-speed 
(100,000 X 30 min) centrifugation, while Fig. 6B shows the 
proteins in the pellet of a low-speed (16,000 X g, 10 min) 
centrifugation. Many proteins are tremendously enriched in 
one fraction or the other, while others are present in both. 



Most glycolytic enzymes (e.g., Tdh2, Tdh3, Eno2, Pdcl, Adhl, 
and Fbal) are enriched in the supernatant fraction. The only 
exception is Pfkl (not indicated), which is found in both pellet 
and supernatant fractions. Many proteins involved in protein 
synthesis (Eftl, YeG, Prtl, Tifl, and RpaO) are in the pellet, 
possibly because of the association of ribosomes with the en- 
doplasmic reticulum. However, Efbl is in the supernatant, as is 
a substantial portion of the Eftl. Perhaps surprisingly, several 
mitochondrial proteins (Atp2 [not shown] and Ilv5) are largely 
in the supernatant. Perhaps glass bead breakage of cells re- 
leases mitochondrial proteins. The nuclear protein Gspl is in 
the pellet fraction. The enrichment produced by centrifugation 
makes it possible to see minor spots which are otherwise poorly 
resolved from surrounding proteins. Figure 6B shows that the 
previously identified Tifl spot is surrounded by as many as six 
other spots that cofractionate. We observed six identical or 
very similar additional spots when we overexpressed Tifl from 
a high-copy-number plasmid (not shown). Signal overlaps only 
one or two of these spots in ^^P-labeling experiments (Fig. 5), 
and so the different forms are not mainly due to different 
phosphorylation states. 

DISCUSSION 

Our experience with developing a 2D gel protein database 
for S. cerevisiae is summarized here. With current technology, 
we can see the most abundant 1,200 proteins, which is about 
one- third to one-quarter of the proteins expressed. The re- 
maining proteins will be difficult to see and study with the 
methods that we have used, not because of a lack of sensitivity 
but because weak spots are covered by nearby strong spots. 

Of the 1,200 proteins seen, we have identified 148, with a 
bias toward the most abundant proteins. Steady application of 
the methods already used would allow identification of most of 
the remaining proteins. Gene overexpression will be particu- 
larly useful, since it is not affected by the lower abundance of 
the remaining visible proteins. 
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FIG. 5. Phosphorylated proteins. (A) Mixture of ^-P-labeled proteins and ^^S-labeled proteins. Two separate labeling reactions were done, one with and one 
with S, and extracts were mixed and run on a 2D gel. Spots marked with numbers rather than gene names represent spots noted on ^^S gels but unidentified. Spots 
labeling with P were identified'by (i) increased labeling compared to the ^^S-only gel (not shown); (ii) the characteristic fuzziness of a ^^P-labeled spot; and (iii) the 
decay of signal intensity seen on exposures made 4 weeks later (not shown). A minor form of Tpil and at least sbe minor forms of Tifl have been noted in overexpression 
experiments (see also Fig. 6B); positions of the minor forms are indicated by circles. (B) ^^P-only labeling. The major form of Tpil, which is not labeled with ^^P, is 
indicated by a large circle; positions of seven forms of Tifl are indicated by smaller circles. 
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FIG. 6. Fractionation by centrifugation. (A) Proteins in the supernatant of a 100.000 x g, 30-niin spin; proteins in the pellet of a 16,000 X g, 10-min spin. Supernatant 
fractions examined in multiple experiments done over a wide range of g forces looked similar to each other, as did the pellet fractions, 
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2D gels of the kind that we have used are not suitable for 
visualization of rare proteins. However it will be possible to 
study on a global basis metabohc processes involving relatively 
abundant proteins, such as protein synthesis, glycolysis, glu- 
coneogenesis, amino acid synthesis, cell wall synthesis, nucle- 
otide synthesis, lipid metabolism, and the heat shock response. 

Gygi et al. (10) have recently completed a study similar to 
ours. Despite generating broadly similar data, Gygi et al. 
reached markedly different conclusions. We believe that both 
mRNA abundance and codon bias are useful predictors of 
protein abundance. However, Gygi et al. feel that mRNA 
abundance is a poor predictor of protein abundance and that 
"codon bias is not a predictor of either protein or mRNA 
levels" (10). These different conclusions are partly a matter of 
viewpoint. Gygi et al. focus on the fact that the correlations of 
mRNA and codon bias with protein abundance are far from 
perfect, while we focus on the fact that, considering the wide 
range of mRNA and protein abundance and the undoubted 
presence of other mechanisms affecting protein abundance, 
the correlations are quite good. 

However, the different conclusions are also partly due to 
different methods of statistical analysis and to real differences 
in data. With respect to statistics, Gygi et al, used the Pearson 
product-moment correlation coefficient (r^) to measure the 
covariance of mRNA and protein abundance. Depending on 
the subset of data included, their values ranged from 0.1 to 
0.94. Because of the low r values with some subsets of the 
data, Gygi et al. concluded that the correlation of mRNA to 
protein was poor. However, the correlation is a parametric 
statistic and so requires variates following a bivariate normal 
distribution; that is, it would be valid only if both mRNA and 
protein abundances were normally distributed. In fact, both 
distributions are very far from normal (data not shown), and so 
a calculation of is inappropriate. There was no statistical 
backing for the assertion that codon bias fails to predict pro- 
tein abundance. 

We have taken two statistical approaches. First, we have 
used the Spearman rank correlation coefiicient (r^). Since this 
statistic is nonparametric, there is no requirement for the data 
to be normally distributed. Using the we find that mRNA 
abundance is well correlated with protein abundance (r^ = 
0.74), and the CAI is also well correlated with protein abun- 
dance (r, = 0,80) (and also with mRNA abundance [data not 
shown]). For the data of Gygi et ai. (10), we obtained similar 
results, though with their data the correlation is not as good; 
= 0.59 for the mRNA-to-protein correlation, and = 0.59 for 
the codon bias-to-protein correlation. 

In a second approach, we transformed the mRNA and pro- 
tein data to forms where they were normally distributed, to 
allow calculation of an (Materials and Methods). Two trans- 
formations, Box-Cox and logarithmic, were used; both gave 
good correlations with our data [e.g., r ~ 0.76 for log(adjusted 
RNA) to log(protein)]. We were not able to transform the data 
of Gygi et al, to a normal distribution. 

Finally, there are also some differences in data between the 
two studies. These may be partly due to the different measure- 
ment techniques used: Gygi et al. measured protein abundance 
by cutting spots out of gels and measuring the radioactivity in 
each spot by scintillation counting, whereas we used phospho- 
rimaging of intact gels coupled to image analysis. We com- 
pared our data to theirs for the proteins common between the 
studies (but excluding proteins whose mRNAs are known to 
differ between rich and minimal media, and excluding Tifl, 
which was anomalous in differing by 100-foid between the two 
data sets). The between the two protein data sets was 0.88 
(P < 0.0001). Although this is a strong correlation, the fact that 



it is less than 1,0 suggests that there may have been errors in 
measuring protein abundance in one or both studies. After 
normalizing the two data sets to assume the same amount of 
protein per cell, we found a systematic tendency for the protein 
abundance data of Gygi et al. to be slightly higher than ours for 
the highest-abundance proteins and also for the lowest-abun- 
dance proteins but slightly lower than ours for the middle- 
abundance proteins. These systematic differences suggest some 
systematic errors in protein measurement. Although we do not 
know what the errors are, we suggest the following as a rea- 
sonable speculation. For the highest-abundance proteins, we 
may have underestimated the amount of protein because of a 
slightly nonlinear response of the phosphorimager screens. For 
the lowest-abundance proteins, Gygi et al. may have overesti- 
mated the amount of protein because of difficulties in accu- 
rately cutting very small spots out of the gel and because of 
difficulties in background subtraction for these small, weak 
spots. The difference in the middle abundance proteins may be 
a consequence of normalization, given the two errors above. 

The low-abundance proteins in the data set of Gygi et al. 
have a poor correlation with mRNA abundance. We calculate 
that the r, is 0,74 for the top 54 proteins of Gygi et al, but only 
0.22 for the bottom 53 proteins, a statistically significant dif- 
ference. However, with our data set, the is 0.62 for the top 33 
proteins and 0,56 (not significantly different) for the bottom 33 
proteins (which are comparable in abundance to the bottom 
53 proteins of Gygi et al.). Thus, our data set maintains a good 
correlation between mRNA and protein abundance even at 
low protein abundance. This is consistent with our speculation 
that protein quantification by phosphorimaging and image 
analysis may be more accurate for small, weak spots than is 
cutting out spots followed by scintillation counting. Our rela- 
tively good correlations even for nonabundant proteins may 
also reflect the fact that we used both SAGE data and RNA 
hybridization data, which is most helpful for the least abundant 
mRNAs. In summary, we feel that the poor correlation of 
protein to mRNA for the nonabundant proteins of Gygi et ai, 
may reflect difficulty in accurately measuring these nonabun- 
dant proteins and mRNAs, rather than indicating a truly poor 
correlation in vivo. It is not surprising that observed correla- 
tions would be poorer with less-abundant proteins and 
mRNAs, simply because the accuracy of measurement would 
be worse. 

How well can mRNA abundance predict protein abun- 
dance? With = 0,76 for logarithmically transformed mRNA 
and protein data, the coefficient of determination, (r^)^ is 0.58, 
This means that more than half (in log space) of the variation 
in protein abundance is explained by variation in mRNA abun- 
dance. When converted back to arithmetic values, protein 
abundances vary over about 200-fold (Table 1), and (r^)^ = 
0.58 for the log data means that of this 200-fold variation, 
about 20-fold is explained by variation in the abundance of 
mRNA and about 10-fold is unexplained (but could be due 
partly to measurement errors). For proteins much less abun- 
dant than those considered here, we imagine the in vivo cor- 
relation between mRNA and protein abundance will be worse, 
and other regulatory mechanisms such as protein turnover will 
be more important. 

Some important conclusions can be drawn from this sam- 
pling of the proteome. First, there is an enormous range of 
protein abundance, from nearly 2,000,000 molecules per cell 
for some glycolytic enzymes to about 100 per cell for some cell 
cycle proteins (26a). Second, about half of all cellular protein 
is found in fewer than 100 different gene products, which are 
mostly involved in carbohydrate metabolism or protein synthe- 
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sis. Third, the correlation between protein abundance and CAI 
is log linear as far as we can see, which is from about 10,000 
protein molecules per cell to about 1,000,000. This is somewhat 
surprising, because it implies that selective forces for codon 
bias are significant even at moderate expression levels. It also 
means that codon bias is a useful predictor of protein abun- 
dance even for moderately low bias proteins. Fourth, there is a 
good correlation between protein abundance and mRNA 
abundance for the proteins that we have studied. This validates 
the use of mRNA abundance as a rough predictor of protein 
abundance, at least for relatively abundant proteins. Fifth, for 
these abundant proteins, there are about 4,000 molecules of 
protein for each molecule of mRNA. This last conclusion 
raises questions as to how the levels of nonabundant proteins 
are regulated and suggests that protein instability, regulated 
translation, suboptimal rates of translation, and other mecha- 
nisms in addition to transcriptional control may be very impor- 
tant for these proteins. 
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